RUS  ENG
Full version
JOURNALS // Uchenye Zapiski Kazanskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki // Archive

Kazan. Gos. Univ. Uchen. Zap. Ser. Fiz.-Mat. Nauki, 2008 Volume 150, Book 4, Pages 25–40 (Mi uzku698)

This article is cited in 6 papers

Automatic Text Categorization: Methods and Problems

M. S. Ageev, B. V. Dobrov, N. V. Loukachevitch

Research Computer Center, M. V. Lomonosov Moscow State University

Abstract: The paper is devoted to analysis of three techniques of text categorization (manual text categorization, knowledge-based text categorization and machine learning). Their advantages and problems are described. Two approaches are considered, intended to overcome problems of automatic text categorization. Their evaluation on public collections is presented. The first method is based on a large linguistic resource: RuThes Thesaurus and ALOT document processing technique. Another one is machine learning method of text categorization, generating descriptions of categories in form of Boolean formulas.

Keywords: document processing, automatic text categorization, thesaurus, machine-learning.

UDC: 004.912+004.822+004.855.5

Received: 26.02.2008



© Steklov Math. Inst. of RAS, 2024