Abstract:
The aim of this paper is to construct a hierarchical thematic model for abstracts of a major conference. We use Discriminative Probabilistic Model for abstracts clustering at each level of hiererchical structure. We propose to modify Discriminative Probabilistic Model to the balanced structure of the conference. The influence of cluster size is decreased in modified models. Semi-supervised learning is used for document clustering. We construct thematic model at each level of conference structure. We also propose the hierarchal divisive clustering algorithm to construct the hierarchical thematic model. The hierarchical model is based on models for each level of hiererchical structure. The algorithms are applied to collection of conference EURO abstracts. The constructed model is compered with experts model of EURO.