RUS  ENG
Full version
JOURNALS // Vestnik Yuzhno-Ural'skogo Gosudarstvennogo Universiteta. Seriya "Vychislitelnaya Matematika i Informatika" // Archive

Vestn. YuUrGU. Ser. Vych. Matem. Inform., 2023 Volume 12, Issue 1, Pages 28–45 (Mi vyurv291)

This article is cited in 1 paper

A method for creating structural models of text documents using neural networks

D. V. Berezkin, I. A. Kozlov, P. A. Martynyuk, A. M. Panfilkin

Bauman Moscow State Technical University (st. 2nd Baumanskaya 5/1, Moscow, 105005 Russian Federation)

Abstract: The article describes modern neural network BERT-based models and considers their application for Natural Language Processing tasks such as question answering and named entity recognition. The article presents a method for solving the problem of automatically creating structural models of text documents. The proposed method is hybrid and is based on jointly utilizing several NLP models. The method builds a structural model of a document by extracting sentences that correspond to various aspects of the document. Information extraction is performed by using the BERT Question Answering model with questions that are prepared separately for each aspect. The answers are filtered via the BERT Named Entity Recognition model and used to generate the contents of each field of the structural model. The article proposes two algorithms for field content generation: Exclusive answer choosing algorithm and Generalizing answer forming algorithm, that are used for short and voluminous fields respectively. The article also describes the software implementation of the proposed method and discusses the results of experiments conducted to evaluate the quality of the method.

Keywords: information extraction, neural network, named entity recognition, question-answering system.

UDC: 004.89

Received: 03.11.2022

Language: English

DOI: 10.14529/cmse230102



© Steklov Math. Inst. of RAS, 2024