Abstract:
The article deals with the problem of reconstructing missing data in data collections for machine learning problems. We propose a new randomized method for missing data reconstruction based on the technology of entropy-robust estimation and generation of ensembles of random variables. The method is similar to the use of an auxiliary regression to reconstruct missing values, but unlike the latter, no additional constraints are imposed on the likelihood function of errors in the sample in the case of entropy estimation and small amounts of data are permissible; this becomes extremely relevant in problems where the amount of data for training is limited and the omissions are not systematic. The proposed method is used to reconstruct missing data on the areas of thermokarst lakes in the Arctic zone of the Russian Federation as measured from satellite images.
Keywords:missing data reconstruction, entropy-based estimation, randomized machine learning, thermokarst lake, Arctic.