RUS  ENG
Full version
JOURNALS // Artificial Intelligence and Decision Making // Archive

Artificial Intelligence and Decision Making, 2019 Issue 3, Pages 52–59 (Mi iipr180)

Natural language processing

Feature selection for text classification of a news flows based on topical importance characteristic

V. V. Zhebela, S-N. A. Zharikovab, I. V. Sochenkova

a Federal Research Center "Computer Science and Control" of Russian Academy of Sciences, Moscow, Russia
b Peoples' Friendship University of Russia named after Patrice Lumumba, Moscow, Russia

Abstract: The paper presents an approach for ranking the most valuable features for text classification task. The introduced Topical Importance Characteristic leverages the feature selection method comprising the information about the distributions of words or phrases among the topics. We compare this method to well-known TF-IDF approach and use the introduced word-ranking scheme in two classifiers: Random Forrest and Multinomial Naïve Bayes. The Accuracy of classification results was tested in the “20-Newsgroups” dataset. The developed approach outperforms TF-IDF-based methods and matches the Accuracy achieved by the more powerful state of the art approaches such as SVC on the same dataset.

Keywords: topical text classification, machine learning, topical importance characteristic, 20-Newsgroups.

DOI: 10.14357/20718594190306



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2024