A. M. Fedotov, O. V. Prozorov, O. A. Fedotova, A. A. Bapanov, “On the Approach to the Thematic Classification of Documents”, Novosibirsk State University Journal of Information Technologies, 2017, Volume 15, Issue 1,Pages <nobr>79

This article is cited in 1 paper

On the Approach to the Thematic Classification of Documents

A. M. Fedotov^a, O. V. Prozorov^b, O. A. Fedotova^c, A. A. Bapanov^d

^a Insitute of Computational Technologies SB RAS, 6 Acad. Lavrentiev Ave., Novosibirsk, 630090, Russian Federation
^b Novosibirsk State University, 2 Pirogov Str., Novosibirsk, 630090, Russian Federation
^c State Public Scientific and Technical Library SB RAS, 15 Voskhod Str., Novosibirsk, 630200, Russian Federation
^d L. N. Gumilyov Eurasian National University, 2 Satpaev Str., Astana, 010000, Kazakhstan

Abstract: The work is devoted to the analysis of approaches and algorithms for the classification of text documents. The approach to the thematic classification of documents is considered. For this purpose, a specially constructed measure of the proximity of documents is used, taking into account the specifics of the subject area. The values of the weight coefficients in the formula for computing the proximity measure are determined by the assumed a priori reliability of the data of the corresponding scale.

Keywords: document, coordinate indexing, measure of proximity, nominal scale.

UDC: 004.91