RUS  ENG
Full version
JOURNALS // Proceedings of the Institute for System Programming of the RAS // Archive

Proceedings of ISP RAS, 2021 Volume 33, Issue 6, Pages 193–204 (Mi tisp654)

Weakly supervised word sense disambiguation using automatically labelled collections

A. S. Bolshinaa, N. V. Lukashevichb

a Lomonosov Moscow State University
b Research Computing Center Lomonosov of Moscow State University

Abstract: State-of-the-art supervised word sense disambiguation models require large sense-tagged training sets. However, many low-resource languages, including Russian, lack such a large amount of data. To cope with the knowledge acquisition bottleneck in Russian, we first utilized the method based on the concept of monosemous relatives to automatically generate a labelled training collection. We then introduce three weakly supervised models trained on this synthetic data. Our work builds upon the bootstrapping approach: relying on this seed of tagged instances, the ensemble of the classifiers is used to label samples from unannotated corpora. Along with this method, different techniques were exploited to augment the new training examples. We show the simple bootstrapping approach based on the ensemble of weakly supervised models can already produce an improvement over the initial word sense disambiguation models.

Keywords: word sense disambiguation, Russian dataset, RuWordNet.

DOI: 10.15514/ISPRAS-2021-33(6)-13



© Steklov Math. Inst. of RAS, 2024