RUS  ENG
Full version
JOURNALS // Vestnik of Astrakhan State Technical University. Series: Management, Computer Sciences and Informatics // Archive

Vestn. Astrakhan State Technical Univ. Ser. Management, Computer Sciences and Informatics, 2022 Number 2, Pages 41–51 (Mi vagtu717)

COMPUTER SOFTWARE AND COMPUTING EQUIPMENT

Applying artificial intelligence methods for solving problems of searching for semantic associates: case of toponym Moskva

A. V. Borovsky, E. E. Rakovskaya

Baikal State University, Irkutsk, Russia

Abstract: Actual problems of toponymy imply the study of individual words in order to restore the conceptual meaning of geographical names lost, to find out how they reflected the characteristic features of the terrain, the type of ac-tivity of the people inhabiting it, etc. The purpose of the study is to determine the origin of the toponym Moskva by using artificial intelligence methods. The GeoWAC fastText embedding model based on the corpus of Russian-language texts of the RusVecteres service is used to calculate semantic similarity between words. The model assumes defining the semantic associates of toponyms by using the vector representation of words in the semantic space and finding the lexical vectors most closely located to the vector of the original word. To analyze a toponym there is applied a methods of semantic associates, a cluster analysis, a combined method based on the method of transformation of a word with a lost meaning and the analysis of semantic associates for a set of word transformants. The method is formalized by using a model that determines the similarity of the studied word and associates based on different versions of the model for one or more text corpora. The associated words obtained by the artificial intelligence are considered as a semantic cluster, and the calculated cosine similarity between vectors is considered as a measure of the similarity of elements in the cluster. To identify various hypotheses of the origin of the toponym Moskva there has been carried out a cluster analysis of the totality of the first ten vector associates for all transformants of this word. As a result, four hypotheses were advanced: “a famous man”, “firearms”, “beekeeping”, “blood-sucking insects”. The probabilities of the occurrence of these hypotheses are based on the study of the frequency of words in the corpus of the language. The main hypothesis is a “famous person”.

Keywords: embedding model, Russian language, method of word transformation, semantic associates, toponym Moskva, cluster analysis.

UDC: 004.048

Received: 21.04.2022
Accepted: 14.04.2022

DOI: 10.24143/2072-9502-2022-2-41-51



© Steklov Math. Inst. of RAS, 2024