RUS  ENG
Full version
JOURNALS // Vestnik Yuzhno-Ural'skogo Universiteta. Seriya Matematicheskoe Modelirovanie i Programmirovanie // Archive

Vestnik YuUrGU. Ser. Mat. Model. Progr., 2012 Issue 13, Pages 119–127 (Mi vyuru74)

Programming & Computer Software

A Distributed Dictionary — Based Morphological Analysis Framework for Russian Language Processing

D. A. Ustalov, M. L. Goldstein

Institute of Mathematics and Mechanics, Ural Branch of the Russian Academy of Sciences (Yekaterinburg, Russian Federation)

Abstract: This article describes an approach to scaling service morphological parsing of words of natural language processing of various collections of documents in Russian. An overview and critical analysis of existing solutions. The requirements workbench vocabulary morphological analyzer were established. The distributed architecture of the web service morphological analysis, designed to a handle large collections of documents in Russian, presented the form of a structural model. This architecture is implemented as a prototype system in the programming language Ruby. The structure used in the morphological dictionary of a relational schema. Tests of this method in a distributed computing environment showed linear scalability of the proposed solutions. The configuration of the experiment involves the generation of the system load as a HTTP requests, system load balancing working nodes of a distributed system, application servers with a functioning database analyzer and morphological dictionary, as well as a caching node to reduce costs when you run queries to the dictionary. Applying this approach provides a linear increase in performance in distributed systems, automated processing of large volumes of text.

Keywords: distributed computing, natural language processing, corpus linguistics, data-intensive computing, morphological analysis.

UDC: 004.912

MSC: 68T50

Received: 08.06.2012



© Steklov Math. Inst. of RAS, 2024