Full version
JOURNALS // Zhurnal Vychislitel'noi Matematiki i Matematicheskoi Fiziki // Archive

Zh. Vychisl. Mat. Mat. Fiz., 2010 Volume 50, Number 4, Pages 770–783 (Mi zvmmf4868)

This article is cited in 5 papers

Automatic determination of the numbers of components in the EM algorithm for the restoration of a mixture of normal distributions

D. P. Vetrova, D. A. Kropotovb, A. A. Osokina

a Faculty of Computational Mathematics and Cybernetics, Moscow State University, Moscow, 119992, Russia
b Dorodnicyn Computing Center, Russian Academy of Sciences, ul. Vavilova 40, Moscow, 119333, Russia

Abstract: The classical EM algorithm for the restoration of the mixture of normal probability distributions cannot determine the number of components in the mixture. An algorithm called ARD EM for the automatic determination of the number of components is proposed, which is based on the relevance vector machine. The idea behind this algorithm is to use a redundant number of mixture components at the first stage and then determine the relevant components by maximizing the evidence. Experiments with model problems show that the number of clusters thus determined either coincides with the actual number or slightly exceeds it. In addition, clusterization using ARD EM turns out to be closer to the actual clusterization than that obtained by the analogs based on cross validation and the minimum description length principle.

Key words: pattern recognition, probability density restoration, cluster analysis, determination of the number of clusters, EM algorithm, Bayesian learning, automatic relevance determination.

UDC: 519.6:519.7

Received: 24.07.2009
Revised: 11.11.2009

 English version:
Computational Mathematics and Mathematical Physics, 2010, 50:4, 733–746

Bibliographic databases:

© Steklov Math. Inst. of RAS, 2025