RUS  ENG
Full version
JOURNALS // Informatika i Ee Primeneniya [Informatics and its Applications] // Archive

Inform. Primen., 2019 Volume 13, Issue 3, Pages 34–40 (Mi ia607)

This article is cited in 2 papers

Hybrid extreme gradient boosting models to impute the missing data in precipitation records

A. K. Gorsheninab, O. P. Martynovb

a Institute of Informatics Problems, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
b Faculty of Computational Mathematics and Cybernetics, M. V. Lomonosov Moscow State University, GSP-1, Leninskie Gory, Moscow, 119991, Russian Federation

Abstract: The article compares the classical method of extreme gradient boosting implemented in the XGBoost (eXtreme Gradient Boosting) framework with the new modification CatBoost (Categorial Boosting), which is rarely involved in scientific researches. Some hybrid classification-regression models are proposed to improve the accuracy of imputation in missing values in real data using 14 meteorological stations in Germany. The achieved accuracy of the classification is up to 92% and the root-mean-square errors are quite moderate. The hybrid methods outperformed both simple classification and regression models in prediction accuracy. The proposed approaches can be successfully used for meteorological data analysis by machine learning methods as well as for improving the forecasting accuracy in physical models of atmospheric processes.

Keywords: data imputation, precipitation, classification, regression, gradient boosting, XGBoost, CatBoost.

Received: 08.07.2019

DOI: 10.14357/19922264190306



© Steklov Math. Inst. of RAS, 2024