A. S. Bolshina, “The creation of pseudo-annotated data for word sense disambiguation using ensembles of models”, Intelligent systems. Theory and applications, 2022, Volume 26, Issue 1,Pages <nobr>185

Part 4: Natural Language Processing

The creation of pseudo-annotated data for word sense disambiguation using ensembles of models

A. S. Bolshina

Lomonosov Moscow State University, Philological Faculty

Abstract: Nowadays, supervised word sense disambiguation (WSD) algorithms attain the best results on the main benchmarks. However, large sense-tagged training sets are required for their training. This requirement hinders the development of the word sense disambiguation systems for many low-resource languages, including Russian. To address the issue of the knowledge acquisition bottleneck in Russian, in this work we investigate the method for automatic text labelling that is based on the ensemble of weakly supervised WSD models. Our experiments demonstrated that the models retrained on the new pseudo-annotated data outperform the initial models.

Keywords: word sense disambiguation, russian dataset, eLMo, bERT.