RUS  ENG
Full version
JOURNALS // Informatics and Automation // Archive

Informatics and Automation, 2024 Issue 23, volume 4, Pages 1173–1198 (Mi trspy1318)

Artificial Intelligence, Knowledge and Data Engineering

A method for recognition of sentiment and emotions in Russian speech transcripts using machine translation

A. Dvoynikova, I. Kagirov, A. Karpov

St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS)

Abstract: This paper addresses the issue of user emotions and sentiment recognition in transcripts of Russian speech samples using lexical methods and machine translation. The availability of data for sentiment analysis in Russian texts is quite limited, thus this paper proposes a new approach which is based on automatic machine translation of Russian texts into English. Additionally, the paper presents the results of experimental research regarding the impact of partial and full machine translation on emotion and sentiment recognition. Partial translation means translating single lexemes not included in Russian sentiment dictionaries, while full translation implies translating the entire text. A translated text is further analyzed using different English sentiment dictionaries. Experiments have demonstrated that the combination of all English sentiment dictionaries enhances the accuracy of emotion and sentiment recognition in text data. Furthermore, this paper explores the correlation between the length of the text data vector and its representativity. Experimental research for emotion and sentiment recognition tasks was conducted with the use of expert and automatic transcripts of the multimodal Russian corpus RAMAS. Based on the experimental results, one can conclude that the use of word lemmatization is a more effective approach for normalizing words in speech transcripts compared to stemming. The use of the proposed methods involving full and partial machine translation allows for an improvement in sentiment and emotion recognition accuracy by 0.65-9.76% in terms of F-score compared to the baseline approach. As a result of the application of machine translation methods to expert and automatic transcriptions of the Russian speech corpus RAMAS, an accuracy in recognition of 7 emotion classes was achieved at 31.12% and 23.74%, and 3 sentiment classes at 75.37% and 71.60%, respectively. Additionally, the experiments revealed that the use of statistical vectors as a text data vectorization method results in an a 1-5% increase in F-score value compared to concatenated (statistical and sentiment) vectors.

Keywords: machine translation, sentiment dictionaries, emotion recognition, sentiment analysis, sentiment vectors.

UDC: 004.912

Received: 08.11.2023

DOI: 10.15622/ia.23.4.9



© Steklov Math. Inst. of RAS, 2024