Abstract:
The article continues the series of works devoted to the technology of concrete historical research supporting. The technology is based on the principles of co-creation and crowdsourcing and is designed for a wide range of users which are not professional historians and biographers. The article is devoted to the further development of the technology by integrating into it a mechanism that automatically identifies potentially promising areas of research. The proposed approach is to automatically fill in information gaps in a set of facts describing the object of research on the basis of incomplete induction. The analysis of the base for inductive generalization is carried out and the ways of its formation are shown. The possibility of using the data imputation procedure usually used in data analysis and machine learning tasks for this purpose is substantiated. The methods of data imputation are analyzed in the connection with the features of technology and the specifics of concrete historical research. The analysis showed the expediency of the mechanism for automatic hypothesis formation constructing through such method of data imputation as the method of classification trees based on the CHAID (Chi Squared Automatic Interaction Detection) algorithm.
Keywords:concrete historical investigation, distributed technology, formation of hypotheses, information gap, data imputation.