Abstract:
The paper introduces a system for symptoms extraction from
medical clinical records (texts in natural Russian language) and automatic
prediction of a diagnosis in the form of the disease title and its ICD-10 code. The
system is designed for a restricted domain of 6 pulmonary diseases (chronic
obstructive pulmonary disease, pneumonia, bronchial asthma etc) and COVID-19.
Different neural networks are employed for the symptoms extraction by
recognizing certain medical entities and relations between them. A classifier based
on a neural network is responsible for the automatic diagnosis. An annotated
corpus of sentences is created for the training of the neural networks. The
principles and rules of the annotation are described. A corpus of texts is used for
the training of the classifier.
Both subsystems were tested, the resulting accuracy estimates are provided.
The accuracy of diagnosis in the given domain is 88.5%. We also compare our
system with similar works on symptom extraction from texts in various languages,
as well as on automatic diagnosis, including systems such as ChatGPT.
Key words and phrases:clinical decision support systems, symptom extraction, automatic diagnosis prediction, BERT models, ChatGPT-based systems.