Abstract:
As part of the work, a specialized dictionary has been created to search for key terms in the texts of medical instructions, using data from VigiAccess, ICD-10 and rlsnet.ru. The text corpus was previously cleaned and brought to a single format to improve the quality of model training. In the future, it is planned to use the source grls.rosminzdrav.ru, as more authoritative and complete, for information about registered medicines. To automate data annotation, an algorithm has been developed that searches and marks terms from the dictionary in BIO (Begin, Inside, Outside) format, providing structured markup for model training. The model based on deep neural networks has demonstrated high efficiency in recognizing named entities by taking into account contextual dependencies. The semantic graph of medicines was constructed using algorithms for finding connections between named entities. However, automatic identification of deeper connections between graph nodes is difficult and requires additional data markup to account for complex grammatical structures, which will improve the analysis of interactions in the texts of medical instructions.
Keywords:machine learning, deep learning, neural networks, natural language processing, medical drug instructions, semantic graph.