RUS  ENG
Full version
JOURNALS // Zapiski Nauchnykh Seminarov POMI // Archive

Zap. Nauchn. Sem. POMI, 2023 Volume 529, Pages 176–196 (Mi znsl7426)

Blending of predictions boosts understanding for multimodal advertisements

A. Alekseeva, A. Savchenkob, E. Tutubalinacd, E. Myasnikove, S. Nikolenkoa

a Steklov Institute of Mathematics at St. Petersburg, Russia
b Sber AI Lab, Russia
c Sber AI, Russia
d Kazan Federal University, Russia
e Samara National Research University, Russia

Abstract: The advertising industry employs several content modalities to deliver implied messages: images, videos, text, music, and all of them combined. “Decoding” a message implied by multimodal content often requires both text and visual components. We study the tasks of multimodal symbolism prediction, topic detection, and sentiment type classification. Motivated by the difference in parts of the message conveyed by two modalities in advertisements, we train separate models for images and texts and significantly improve upon current state of the art by blending image- and text-based predictions (with OCR-extracted text), providing a comprehensive experimental validation of our approach.

Key words and phrases: multimodal, ads understanding, topic detection, sentiment, sentiment classification.

UDC: 004.852

Received: 12.10.2023

Language: English



© Steklov Math. Inst. of RAS, 2024