RUS  ENG
Full version
JOURNALS // Proceedings of the Institute for System Programming of the RAS // Archive

Proceedings of ISP RAS, 2020 Volume 32, Issue 4, Pages 165–174 (Mi tisp532)

This article is cited in 1 paper

Two step method for grouping news with similar topics

K. A. Skorniakovab, A. S. Laskinaab, D. Yu. Turdakovbc

a Moscow Institute of Physics and Technology
b Ivannikov Institute for System Programming of the Russian Academy of Sciences
c Lomonosov Moscow State University

Abstract: Amount of news is rapidly growing up in recent years. People cannot handle them effectively. This is the main reason why automatic methods of news stream analysis have become an important part of modern science. The paper is devoted to the part of the news stream analysis which is called “event detection”. “Event” is a group of news dedicated to one real-world event. We study news from Russian news agencies. We consider this task as clusterization on news and compare algorithms by external clusterization metrics. The paper introduces a novel approach to detect events at news in Russian language. We propose a two-staged clustering method. It comprises “rough” clustering algorithm at the first stage and clarifying classifier at the second stage. At the first stage, a combination of shingles method and naive named entity based clusterization is used. Also we present a labeled dataset of news event detection based on «Yandex News» service. This manually labeled dataset can be used to estimate event detection methods performance. Empirical evaluation on these corpora proved the effectiveness of the proposed method for event detection at news texts.

Keywords: event detection, clustering, news.

DOI: 10.15514/ISPRAS-2020-32(4)-12



© Steklov Math. Inst. of RAS, 2024