RUS  ENG
Full version
JOURNALS // Sistemy i Sredstva Informatiki [Systems and Means of Informatics] // Archive

Sistemy i Sredstva Inform., 2014 Volume 24, Issue 3, Pages 204–217 (Mi ssi370)

Using hash function for increasing speed of work of the software for morphological analysis of Russian texts

N. V. Somin, M. M. Sharnin

Institute of Informatics Problems, Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The paper considers the problem of increasing efficiency of morphological analysis of Russian texts. The software system for morphological analysis is described, including the set of morphological characteristics and the algorithms of work. The paper mentions the software systems solving the problem of logiń-semantic analysis of natural language texts in which the software system for morphological analysis found application. Features of the system are discussed from the point of view of occupied memory and work speed. The way of storage of morpholexical information using hash functions is suggested which provides high speed of access. The difficulties arising during realization of such approach are discussed and possible solutions are considered. The paper describes the structure of information arrays of the new version and the search algorithms realized in it. The paper also describes a subsystem for putting in and updating morphological information. Specific parameters of the new realization of the software system for morphological analysis and information on speed of work acceleration in comparison with the previous version are given. The paper discusses opportunities of development of the new version of the software system for morphological analysis and of transferring the suggested approach to other components of the linguistic processor.

Keywords: morphological analysis; hash function; linguistic processor; logiń-semantic analysis of natural language texts.

Received: 12.08.2014

DOI: 10.14357/08696527140315



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2024