RUS  ENG
Full version
JOURNALS // Program Systems: Theory and Applications // Archive

Program Systems: Theory and Applications, 2023 Volume 14, Issue 1, Pages 95–123 (Mi ps418)

Information Systems in Medicine

A system for extracting symptom mentions from texts by means of neural networks

Yu. P. Serdyuk, N. A. Vlasova, S. R. Momot

Ailamazyan Program Systems Institute of RAS, Ves'kovo, Russia

Abstract: This paper presents a system for extracting symptom mentions from medical texts in natural (Russian) language. The system finds symptom mentions in texts, brings them to a standard form and identifies the found symptom to a group of similar symptoms. For each stage of processing we use a separate neural network. We extract symptoms of three areas of diseases: allergic and pulmonological diseases, as well as coronavirus infection (COVID-19). We present and describe an annotated corpus of sentences that is used to train neural networks for extracting symptom mentions. These sentences were marked up with the help of a simple XML-like language. An extended BIO-markup format was proposed for the sentences directly received at the input of the neural network. We give the quality evaluation of the symptom extraction accuracy under strict and flexible testing. Possible approaches to normalization and identification of symptom mentions and their implementation are described. Our results are compared with those achieved in similar researches, thus we show the place of our system among clinical decision support systems.

Key words and phrases: natural language processing, neural networks, information extraction, symptom mentions, annotated corpus, BERT-models, Covid-19.

UDC: 81’322+61

MSC: Primary 68T07; Secondary 68T50

Received: 26.12.2022
29.01.2023
Accepted: 29.01.2023

DOI: 10.25209/2079-3316-2023-14-1-95-123



© Steklov Math. Inst. of RAS, 2024