Abstract:
A modified nonparametric algorithm for automatic classification of large-volume statistical data is proposed. Its application makes it possible to detect classes corresponding to unimodal fragments of the probability density of a multidimensional random variable. The compression of the initial information is carried out on the basis of the decomposition of the multidimensional space of features into a data array composed of the centers of the sampling intervals and the corresponding frequencies of belonging to the values of the random variable. Based on these data, a regression estimate of the probability density is synthesized. The information obtained is the basis for the algorithmization of the automatic classification procedure. A class is a compact group of observations of a random variable corresponding to a single-modal fragment of probability density. The computational efficiency of the modified nonparametric algorithm for automatic classification of large-volume statistical data is provided by the compression procedure of the source data, improvement and algorithmization of the traditional nonparametric method of class detection. The computational efficiency of the modified non-parametric algorithm for automatic classification of large volume statistical data is provided by the initial data compression procedure, improvement and algorithmization of the traditional nonparametric method for detecting compact groups of observations of a random variable. The effectiveness of the developed method of automatic classification is confirmed by the results of its application in the analysis of remote sensing data of forests damaged by the Siberian silkworm.
Keywords:nonparametric algorithm for automatic classification, regression estimation of probability density, discretization of the range of random variables, woodlands, remote sensing data.