RUS  ENG
Full version
JOURNALS // Computer Research and Modeling // Archive

Computer Research and Modeling, 2012 Volume 4, Issue 4, Pages 693–706 (Mi crm522)

This article is cited in 14 papers

MATHEMATICAL MODELING AND NUMERICAL SIMULATION

Regularization, robustness and sparsity of probabilistic topic models

K. V. Vorontsova, A. A. Potapenkob

a RUKONT-PhysTech Laboratory, CMAM department, MIPT, 9 Institutskii per., Dolgoprudny, Moscow Region, 141700, Russia
b CMC department, Moscow State University, Leninskie gory, Moscow, 119991, Russia

Abstract: We propose a generalized probabilistic topic model of text corpora which can incorporate heuristics of Bayesian regularization, sampling, frequent parameters update, and robustness in any combinations. Well- known models PLSA, LDA, CVB0, SWB, and many others can be considered as special cases of the proposed broad family of models. We propose the robust PLSA model and show that it is more sparse and performs better that regularized models like LDA.

Keywords: text analysis, topic modeling, probabilistic latent semantic analysis, EM-algorithm, latent Dirichlet allocation, Gibbs sampling, Bayesian regularization, perplexity, robusteness.

UDC: 004.852

Received: 06.09.2012

DOI: 10.20537/2076-7633-2012-4-4-693-706



© Steklov Math. Inst. of RAS, 2025