RUS  ENG
Full version
JOURNALS // Journal of the Belarusian State University. Mathematics and Informatics // Archive

Journal of the Belarusian State University. Mathematics and Informatics, 2022 Volume 1, Pages 83–96 (Mi bgumi180)

Theoretical foundations of computer science

Methods of intellectual data analysis in COVID -19 research

O. V. Sen'koab, A. V. Kuznetsovabc, E. M. Voroninb, O. A. Kravtsovabd, L. R. Borisovae, I. L. Kirilyukf, V. G. Akimkinb

a Federal Research Center «Computer Science and Control», Russian Academy of Sciences, 44 Vavilova Street, 2 building, Moscow 119333, Russia
b Central Research Institute of Epidemiology, Federal Service for Surveillance on Consumer Rights Protection and Human Wellbeing, 3a Novogireevskaya Street, Moscow 111123, Russia
c Institute of Biochemical Physics named after N. M. Emanuel, Russian Academy of Sciences, 4 Kosygina Street, Moscow 119334, Russia
d Lomonosov Moscow State University, 1 Leninskie Gory, Moscow 119991, Russia
e Financial University under the Government of the Russian Federation, 49/2 Leningradskii Avenue, Moscow 125167, Russia
f Institute of Economics, Russian Academy of Sciences, 32 Nakhimovskii Avenue, Moscow 117218, Russia

Abstract: The paper presents an original method for solving the problem of finding a connection between the course of the epidemic and socio-economic, demographic and climatic factors. The method was applied to solve this problem for 110 countries of the world using a set of corresponding curves of the COVID-19 growth rate for the period from January 2020 to August 2021. Hierarchical agglomerative clustering was applied. Four large clusters with uniform curves were identified – 11, 39, 17 and 13 countries, respectively. Another 30 countries were not included in any cluster. Using machine learning methods, we identified the differences in socio-economic, demographic and geographical and climatic indicators in the selected clusters of countries of the world. The most important indicators by which the clusters differ from each other are amplitude of temperatures throughout the year, high-tech exports, Gini coefficient, size of the urban population and the general population, index of net barter terms of trade, population growth, average January temperature, territory (land area), number of deaths due to natural disasters, birth rate, coastline length, oil reserves, population in urban agglomerations with a population of more than 1 million etc. This approach (the use of clustering in combination with classification by methods of logical-statistical analysis) has not been used by anyone before. The found patterns will make it possible to more accurately predict the epidemiological process in countries belonging to different clusters. Supplementing this approach with autoregressive models will automate the forecast and improve its accuracy.

Keywords: cluster analysis; machine learning methods; statistics; epidemiological process; COVID-19.

UDC: 004.4

Received: 31.12.2021
Revised: 04.03.2022
Accepted: 04.03.2022

DOI: 10.33581/2520-6508-2022-1-83-96



© Steklov Math. Inst. of RAS, 2024