RUS  ENG
Full version
JOURNALS // Modelirovanie i Analiz Informatsionnykh Sistem // Archive

Model. Anal. Inform. Sist., 2025 Volume 32, Number 1, Pages 80–94 (Mi mais842)

Artificial intelligence

Hierarchical classification of scientific articles using deep learning (using the UDC hierarchy as an example)

V. Y. Mamedov, D. A. Kovalevsky, D. A. Morozov, S. S. Stolyarov, S. S. Ospichev

Novosibirsk National Research State University, Novosibirsk, Russia

Abstract: The exponential growth in scientific publications has heightened the need for robust tools to organize and retrieve research effectively. The Universal Decimal Classification (UDC) serves as a valuable framework for categorizing articles by subject area. However, manual assignment of UDC codes is often prone to inaccuracies or oversimplification, limiting its utility. In this study, we present a novel approach for the automated assignment of UDC codes to scientific articles using BERT-based models. Our methodology was trained and evaluated on a dataset comprising over 19,000 articles in mathematics and related disciplines. To address the hierarchical structure of UDC, we developed two specialized evaluation metrics: hierarchical classification accuracy and hierarchical recommendation accuracy. We also explored multiple strategies for flattening hierarchical labels. Our results demonstrated a hierarchical recommendation accuracy of 0.8220. Furthermore, blind expert evaluation revealed that discrepancies between reference and predicted labels often stem from errors in the original UDC code assignments by article authors. Our approach demonstrates strong potential for automating the classification of scientific articles and can be extended to other hierarchical classification systems.

Keywords: text classification, hierarchical text classification, universal decimal classifier, deep learning.

UDC: 004.912

MSC: 68T50

Received: 14.02.2025
Revised: 24.02.2025
Accepted: 26.02.2025

DOI: 10.18255/1818-1015-2025-1-80-94



© Steklov Math. Inst. of RAS, 2025