D. D. Bakshandaeva, D. V. Dimitrov, V. S. Arkhipkin, A. V. Shonenkov, M. S. Potanin, D. K. Karachev, A. V. Kuznetsov, A. D. Voronov, A. A. Petiushko, V. F. Davydova, E. V. Tutubalina, “Many heads but one brain: FusionBrain – a single multimodal multitask architecture and a competition”, Компьютерная оптика, 2023, том 47, выпуск 1,страницы 185

ЧИСЛЕННЫЕ МЕТОДЫ И АНАЛИЗ ДАННЫХ

Many heads but one brain: FusionBrain – a single multimodal multitask architecture and a competition

D. D. Bakshandaeva^ab, D. V. Dimitrov^cad, V. S. Arkhipkin^a, A. V. Shonenkov^d, M. S. Potanin^d, D. K. Karachev^d, A. V. Kuznetsov^ade, A. D. Voronov^d, A. A. Petiushko^d, V. F. Davydova^a, E. V. Tutubalina^adf

^a Sber AI
^b University of Helsinki
^c Lomonosov Moscow State University
^d Artificial Intelligence Research Institute, Moscow
^e Samara National Research University
^f National Research University Higher School of Economics, Moscow

Аннотация: Supporting the current trend in the AI community, we present the AI Journey 2021 Challenge called FusionBrain, the first competition which is targeted to make a universal architecture which could process different modalities (in this case, images, texts, and code) and solve multiple tasks for vision and language. The FusionBrain Challenge combines the following specific tasks: Code2code Translation, Handwritten Text recognition, Zero-shot Object Detection, and Visual Question Answering. We have created datasets for each task to test the participants’ submissions on it. Moreover, we have collected and made publicly available a new handwritten dataset in both English and Russian, which consists of 94,128 pairs of images and texts. We also propose a multimodal and multitask architecture – a baseline solution, in the centre of which is a frozen foundation model and which has been trained in Fusion mode along with Single-task mode. The proposed Fusion approach proves to be competitive and more energy-efficient compared to the task-specific one.

Ключевые слова: multimodality, multitask, bilinguality, foundation models, FusionBrain challenge

Поступила в редакцию: 08.09.2022
Принята в печать: 21.11.2022

Язык публикации: английский

DOI: 10.18287/2412-6179-CO-1220