D. Dzendzik, S. Serebryakov, “Semi-automatic generation of linear event extraction patterns for free texts”, Uchenye Zapiski Kazanskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki, 2013, Volume 155, Book 4,Pages <nobr>99

Semi-automatic generation of linear event extraction patterns for free texts

D. Dzendzik^ab, S. Serebryakov^b

^a Saint-Petersburg State University, Saint Petersburg, Russia
^b Hewlett-Packard Laboratories, Saint Petersburg, Russia

Abstract: In this paper we describe semi-automatic approach to generating event extraction patterns for free texts. The algorithm is composed of four steps: we automatically extract possible events from a corpus of free documents, cluster them using dependency-based parse tree paths, validate random samples from each cluster and generate linear patterns using positive event clusters. We compare it with the system that uses handcrafted patterns.

Keywords: event extraction, linear patterns, regular expressions, TextMARKER, RUTA.

UDC: 004.912

Received: 31.07.2013

Language: English