RUS  ENG
Full version
JOURNALS // Informatika i Ee Primeneniya [Informatics and its Applications] // Archive

Inform. Primen., 2018 Volume 12, Issue 3, Pages 91–98 (Mi ia552)

This article is cited in 2 papers

Semantic processing of unstructured textual data based on the linguistic processor PullEnti

E. B. Kozerenkoa, K. I. Kuznetsova, D. A. Romanovb

a Institute of Informatics Problems, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
b National Research University “Higher School of Economics,” 20 Myasnitskaya Str., Moscow 101000, Russian Federation

Abstract: The paper presents the method for creation of knowledge extraction systems based on the approach employing the software tool system PullEnti comprising the algorithms for morphological and semantic-syntactical analysis which makes it possible to extract entities of certain types from natural language texts (persons, organizations, locations, and other target semantic objects). The PullEnti system uses dynamically connected components (plugins) which makes it possible to activate various functions without recompiling. This is how the semantic analysis unit is incorporated. During the analysis, the semantic units (tokens) are established, which are typed phrases: text, numerical data, etc. Examples of implemented projects for different subject areas are given.

Keywords: semantic modeling; named entities recognition, data intensive domains; automated systems of knowledge extraction; semantic search; intelligent Internet technologies.

Received: 13.07.2018

DOI: 10.14357/19922264180313



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2024