Abstract:
The value of the sample distribution identification error of a multidimensional discrete random variable among a library of reference patterns is studied, depending on the dimension of the random vector, the sample length and the distance between two reference distributions in the norms C and L1. It is shown that the recognition error in the L1 norm is significantly lower than in C. Reference distributions of $n$-grams for texts are considered as a practical application. It turned out that the accuracy of identification is mainly determined by the individual characteristics of the standards, and not by the distances between them. An algorithm has been developed to test the system of standards for recognition accuracy.