RUS  ENG
Full version
JOURNALS // Zapiski Nauchnykh Seminarov POMI // Archive

Zap. Nauchn. Sem. POMI, 2021 Volume 499, Pages 206–221 (Mi znsl7060)

II

Word-based russian text augmentation for character-level models

R. B. Galinskya, A. M. Alekseevba, S. I. Nikolenkoab

a St. Petersburg Department of Steklov Mathematical Institute of Russian Academy of Sciences
b Saint Petersburg State University

Abstract: Large-scale deep learning models, including models for natural language processing, require large datasets for training that could be unavailable for low-resource languages or for special domains. We consider a way to approach the problem of poor variability and small size of available data for training NLP models based on augmenting the data with synonyms. We design a novel augmentation scheme that includes replacing words with synonyms and reshuffling the words, apply it to the Russian language, and report improved results for the sentiment analysis task.

Key words and phrases: Deep learning, natural language processing, data augmentation, sentiment analysis.

Received: 02.10.2020

Language: English



© Steklov Math. Inst. of RAS, 2024