RUS  ENG
Full version
JOURNALS // Informatics and Automation // Archive

Tr. SPIIRAN, 2013 Issue 30, Pages 189–203 (Mi trspy625)

Algorithm of thesaurus extension generation for enterprise search

D. Dontsov

St. Petersburg Institute for Informatics and Automation of RAS

Abstract: The main goal of this paper is to create algorithm of synonyms thesaurus generation. Modern search engines use such thesauri for query expansion. Such approach allows to return not only documents containing words from query, but also ones containing their synonyms or semantically similar terms. Semi-automatic method of named entity recognizer training was developed as a part of this work. Semi-automatic method of extracted entities validation is also given.

Keywords: information retrieval, query extension, thesaurus extension, synonyms extraction, named entity recognition, string clustering.

UDC: 004.622

Received: 03.04.2013



© Steklov Math. Inst. of RAS, 2024