RUS  ENG
Full version
JOURNALS // Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia // Archive

Dokl. RAN. Math. Inf. Proc. Upr., 2025 Volume 527, Pages 229–244 (Mi danma681)

SPECIAL ISSUE: ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING TECHNOLOGIES

Medical intelligence solution for population-based oncology prevention

P. A. Filonenko, V. N. Kokh, P. D. Blinov

Sber AI, Moscow, Russia

Abstract: Objective: To develop a scalable AI solution for population-scale cancer prevention for effective detection of malignant neoplasms (MNs) using the minimal necessary dataset from electronic health records (EHRs) – medical diagnosis and services codes. The solution addresses the resource limitations of traditional MN screening methods while maintaining high efficiency in patient risk stratification.
Methods: The solution is based on a combination of gradient boosting with survival models. Over 700 predictors are constructed from raw EHR events, including sociodemographic characteristics, visit patterns, clinical history, and event frequencies by diagnose groups. The key feature involves utilizing population-based (Kaplan-Meier estimates) and individual (AFT model) risk characteristics as additional predictors for gradient boosting. Validation was conducted on data from over 2.5 million adult patients across 5 regions of the Russian Federation under the supervision of certified oncologists.
Results: The solution achieves a Average Precision (AP) metric of 0.228, outperforming modern deep learning and large language model solutions with the best AP of 0.193. When forming a risk group comprising 1% of the population, the solution can identify 3.7–5.4 times more patients with MN using the same screening volume. In a 12-month retrospective study, the solution increased the number of detected MNs cases by +91% and expanded regional MNs coverage by 36 percentage points compared to current preventive health examination processes. The solution demonstrates high scalability: processing data for a city of one million takes less than three hours and requires no high-performance servers.
Conclusions: The research represents a solution for scalable population-based MN prevention using exclusively medical diagnosis and procedure codes from EHRs. The system naturally integrates into existing medical workflows by directing at-risk patients to primary care physicians for decisions regarding oncologist referrals and additional examinations. Minimal data and computational resource requirements make the solution accessible for implementation across diverse healthcare systems, including remote regions with limited resources, opening new opportunities for enhancing population-based MN prevention effectiveness.

Keywords: population cancer prevention, malignant neoplasms, AI in medicine, retrospective study.

UDC: 004.9

Received: 14.08.2025
Accepted: 15.10.2025

DOI: 10.7868/S2686954325070197



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2025