Abstract:
This paper presents a system for extracting symptom mentions from medical texts in natural (Russian) language. The system finds symptom mentions in texts, brings them to a standard form and identifies the found symptom to a group of similar symptoms. For each stage of processing we use a separate neural network. We extract symptoms of three areas of diseases: allergic and pulmonological diseases, as well as coronavirus infection (COVID-19). We present and describe an annotated corpus of sentences that is used to train neural networks for extracting symptom mentions. These sentences were marked up with the help of a simple XML-like language. An extended BIO-markup format was proposed for the sentences directly received at the input of the neural network. We give the quality evaluation of the symptom extraction accuracy under strict and flexible testing. Possible approaches to normalization and identification of symptom mentions and their implementation are described. Our results are compared with those achieved in similar researches, thus we show the place of our system among clinical decision support systems.
Key words and phrases:natural language processing, neural networks, information extraction, symptom mentions, annotated corpus, BERT-models, Covid-19.