RUS  ENG
Full version
JOURNALS // Informatics and Automation // Archive

Tr. SPIIRAN, 2012 Issue 23, Pages 231–253 (Mi trspy544)

This article is cited in 1 paper

A quantitative analysis of the lexicon in Russian WordNet and Wiktionaries

A. V. Smirnov, V. M. Kruglov, A. A. Krizhanovsky, N. B. Lugovaya, A. A. Karpov, I. S. Kipyatkova

St. Petersburg Institute for Informatics and Automation of RAS

Abstract: A quantitative analysis of the Russian lexicon was performed in the paper. The thesaurus Russian WordNet and two electronic dictionaries are under examination: the Russian Wiktionary and the English Wiktionary. The quantity of Russian words and their meanings (senses) according to the parts of speech are compared. The distribution of words for each part of speech, the quantity of monosemous and polysemous words and the distribution of words by number of meanings were calculated and compared across these dictionaries. The analysis of the distribution of words by number of meanings revealed a problem that too few or no ambigous Russian words with the number of meanings more than 4 are presented in the English Wiktionary (in comparison with the Russian Wiktionary). The analysis shows that the average polysemy, the number and the distribution of word senses follow similar patterns in both expert and collaborative resources with relatively minor differences.

Keywords: computational linguistics, lexicography, lexical analysis, Russian language.

UDC: 004.912

PACS: 01.30.Kj

MSC: 68T50

Received: 15.10.2012



© Steklov Math. Inst. of RAS, 2024