V. Malykh, V. Lyalin, “Improving classification robustness for noisy texts with robust word vectors”, Zap. Nauchn. Sem. POMI, 2021, Volume 499,Pages <nobr>236

II

Improving classification robustness for noisy texts with robust word vectors

V. Malykh^abc, V. Lyalin^b

^a St. Petersburg Department of Steklov Institute of Mathematics, nab. r. Fontanki, 27, 191023, St. Petersburg, Russia
^b Moscow Institute of Physics and Technology, 9 Institutskiy per., 141701, Dolgoprudny, Russia
^c Institute for Systems Analysis, pr. 60-letiya Oktyabrya, 9, 117312, Moscow, Russia

Abstract: Text classification is a fundamental task in natural language processing, and a huge body of research has been devoted to it. However, there has been little work on investigating noi se robustness for the developed approaches. In this work, we are bridging this gap, introducing results on noise robustness testing of modern text classification architectures for Engl ish and Russian languages. We benchmark the CharCNN and SentenceCNN models and introduce a new model, called RoVe, that we show to be the most robust to noise.

Key words and phrases: word vectors, distributed representations, d natural language processing.

UDC: 004.85

Received: 12.01.2019

Language: English