A. M. Kolosov, A. I. Maĭsuradze, “Improving the quality of vector representations of words by using several sources of representations”, Intelligent systems. Theory and applications, 2026, Volume 30, Issue 1,Pages <nobr>87

Part 2. Special Issues in Intellectual Systems Theory

Improving the quality of vector representations of words by using several sources of representations

A. M. Kolosov^a, A. I. Maĭsuradze^b

^a Lomonosov Moscow State University, Faculty of Mechanics and Mathematics
^b Lomonosov Moscow State University, Faculty of Computational Mathematics and Cybernetics

Abstract: Word vector representations are widely used in machine translation, recommender systems, and information retrieval. The quality of such representations, measured as the rank correlation with expert assessments of semantic similarity, remains limited. This paper proposes an approach to improving the quality of word vector representations by merging several independent sources of primary representations. The notions of monotone and antimonotone quadruplets of words are introduced, and the hypothesis that the information contained in monotone quadruplets allows one to recover the true order of similarities for antimonotone quadruplets is formulated and verified. A method for selecting word quadruplets, a two-step correction procedure based on a fully connected layer and a quadruplet loss function, as well as a method for evaluating the quality of the resulting representations are proposed. Experimental results on Word2Vec and GloVe models trained on a lemmatised Wikipedia corpus demonstrate the feasibility of improving representation quality when evaluated on the MEN, SimLex-999, and WordSim-353 expert datasets.

Keywords: word vector representations, semantic similarity, data fusion, quadruplet loss, multidimensional scaling, Word2Vec, GloVe.