R. E. Suvorov, A. O. Shelmanov, M. A. Kamenskaya, I. V. Smirnov, “Information extraction from scientific texts using active machine learning”, Artificial Intelligence and Decision Making, 2017, Issue 4,Pages <nobr>40

Natural language processing

Information extraction from scientific texts using active machine learning

R. E. Suvorov, A. O. Shelmanov, M. A. Kamenskaya, I. V. Smirnov

Federal Research Center "Computer Science and Control" of Russian Academy of Sciences, Moscow

Abstract: The paper addresses the task of information extraction from natural language texts using machine learning methods. For creating an information extraction system based on machine learning, usually, large annotated text corpora are required. Another problem that arises during the development of such systems is feature engineering. To solve the first problem, we propose methods of information extraction based on active machine learning techniques. To solve the second problem, we investigate methods for generating the feature space based on the results of the deep linguistic analysis. Experimental studies of the proposed methods showed that active learning significantly reduces the amount of labor required for creating an information extraction system, while maintaining the quality of the trained models. Using the results of deep linguistic analysis for generating feature space improves the quality of models for information extraction.

Keywords: information extraction, deep linguistic analysis, active machine learning, multipurpose feature engineering, scientific texts processing.