A. G. Shishkin, A. A. Moskvin, “Application of deep learning methods to recognize the emotional state of a person in a video image”, Artificial Intelligence and Decision Making, 2019, Issue 2,Pages <nobr>3

This article is cited in 1 paper

Data mining

Application of deep learning methods to recognize the emotional state of a person in a video image

A. G. Shishkin, A. A. Moskvin

Lomonosov Moscow State University, Moscow, Russia

Abstract: In this paper, using the use of deep neural networks developed and implemented a model that allows you to determine in real time with limited computing resources emotional state of a person by video sequence, which is present as a voice signal related to the source for which you want to determine the state, and his face full face. Visual information is represented by 16 consecutive frames with a size of 96 $\times$ 96 pixels, and voice – with 140 of characteristics for a sequence of the 37 Windows. On the basis of experimental studies, the architecture of the model using convolutional and recurrent neural networks is developed. For 7 classes that meet different emotional States – neutral state, anger, sadness, fright, joy, disappointment and surprise – the recognition efficiency is 59%. Studies have shown that the use of audio information in conjunction with the visual can increase the accuracy of recognition by 12%. The created system is dynamic in terms of selection of parameters, narrowing or expanding the number of classes, as well as the ability to easily add, accumulate and use information from other external devices for further development and improvement of classification accuracy.

Keywords: artificial neural networks, deep learning, emotion recognition, video, speech signal.

DOI: 10.14357/20718594190201