RUS  ENG
Full version
JOURNALS // Matematicheskoe modelirovanie // Archive

Mat. Model., 2020 Volume 32, Number 2, Pages 37–57 (Mi mm4154)

This article is cited in 1 paper

Matrix text models. Text corpora models

M. G. Kreines, E. M. Kreines

BaseTech Llc, Moscow

Abstract: The models of text corpora, formed on the basis of the matrix model of texts in natural languages, are presented. As methods to form models of collections we consider the techniques of computational identification of the thematic structure of the collections. We suggest to use the models for searching for thematically similar text collections and thematic categorization of texts based on text models and text collections. The differences of the proposed models of text collections from the common approaches to their analysis and modeling are analyzed.

Keywords: natural language texts, text corpora, text corpora models, topic models, text models, text information retrieval.

Received: 16.05.2019
Revised: 16.05.2019
Accepted: 01.07.2019

DOI: 10.20948/mm-2020-02-03


 English version:
Mathematical Models and Computer Simulations, 2020, 12:5, 779–790


© Steklov Math. Inst. of RAS, 2025