Abstract:
The classical EM algorithm for the restoration of the mixture of normal probability distributions cannot determine the number of components in the mixture. An algorithm called ARD EM for the automatic determination of the number of components is proposed, which is based on the relevance vector machine. The idea behind this algorithm is to use a redundant number of mixture components at the first stage and then determine the relevant components by maximizing the evidence. Experiments with model problems show that the number of clusters thus determined either coincides with the actual number or slightly exceeds it. In addition, clusterization using ARD EM turns out to be closer to the actual clusterization than that obtained by the analogs based on cross validation and the minimum description length principle.
Key words:pattern recognition, probability density restoration, cluster analysis, determination of the number of clusters, EM algorithm, Bayesian learning, automatic relevance determination.