RUS  ENG
Full version
JOURNALS // Trudy Instituta Matematiki i Mekhaniki UrO RAN // Archive

Trudy Inst. Mat. i Mekh. UrO RAN, 2020 Volume 26, Number 3, Pages 56–68 (Mi timm1745)

This article is cited in 2 papers

Convergence of the Algorithm of Additive Regularization of Topic Models

I. A. Irkhin, K. V. Vorontsov

Moscow Institute of Physics and Technology (National Research University), Dolgoprudny, Moscow Region

Abstract: The problem of probabilistic topic modeling is as follows. Given a collection of text documents, find the conditional distribution over topics for each document and the conditional distribution over words (or terms) for each topic. Log-likelihood maximization is used to solve this problem. The problem generally has an infinite set of solutions and is ill-posed according to Hadamard. In the framework of Additive Regularization of Topic Models (ARTM), a weighted sum of regularization criteria is added to the main log-likelihood criterion. The numerical method for solving this optimization problem is a kind of an iterative EM-algorithm written in a general form for an arbitrary smooth regularizer as well as for a linear combination of smooth regularizers. This paper studies the problem of convergence of the EM iterative process. Sufficient conditions are obtained for the convergence to a stationary point of the regularized log-likelihood. The constraints imposed on the regularizer are not too restrictive. We give their interpretations from the point of view of the practical implementation of the algorithm. A modification of the algorithm is proposed that improves the convergence without additional time and memory costs. Experiments on a news text collection have shown that our modification both accelerates the convergence and improves the value of the criterion to be optimized.

Keywords: natural language processing, probabilistic topic modeling, probabilistic latent semantic analysis (PLSA), latent Dirichlet allocation (LDA), additive regularization of topic models (ARTM), EM-algorithm, sufficient conditions for convergence.

UDC: 519.853.4

MSC: 90C30, 68T50

Received: 20.07.2020
Revised: 06.08.2020
Accepted: 17.08.2020

DOI: 10.21538/0134-4889-2020-26-3-56-68


 English version:
Proceedings of the Steklov Institute of Mathematics (Supplementary issues), 2021, 315, suppl. 1, S128–S139

Bibliographic databases:


© Steklov Math. Inst. of RAS, 2024