RUS  ENG
Full version
JOURNALS // Preprints of the Keldysh Institute of Applied Mathematics // Archive

Keldysh Institute preprints, 2022 043, 24 pp. (Mi ipmp3069)

This article is cited in 1 paper

Two-factor patterns construction in problems of texts classification

M. Yu. Voronina, A. A. Kislitsyn, Yu. N. Orlov


Abstract: Two-factor patterns of empirical distributions of bigram frequencies for machine classification of texts by authors and subject are constructed. Text attributes are recognized by the nearest neighbor method in relation to reference distributions. The proximity between distributions is understood in the sense of the norm in L1. The 'author-topic' pair of an unknown text is defined as a nearest neighbor pattern. The problem of recognizing the author regardless of the topic of the text and the topic regardless of the author is analyzed. The possibilities of enlarging and detailing classification features are also being investigated.

Keywords: machine classification, text, bigram distribution, spectral portrait, clustering.

DOI: 10.20948/prepr-2022-43



© Steklov Math. Inst. of RAS, 2025