Abstract:
The article continues the series of works devoted to the technology of concrete historical research supporting. The technology is based on the principles of co-creation and crowdsourcing and is designed for a wide range of users which are not professional historians and biographers. The expediency of expanding the list of concrete historical research tasks solved within the framework of the described technology using machine learning methods is shown. The special importance of data preparation is noted due to the fragmentation and inconsistency of concrete historical information. This article is devoted to the specifics of concrete historical data cleansing and the analysis of the possibility of using mechanisms and algorithms already integrated into the technology for this purpose. The main directions in which data cleansing is carried out are listed. Suitable tools already included in the technology have been identified for each direction. Particular attention is paid to tools for eliminating inconsistencies. The stages of data cleansing are listed and the scheme of interaction of all mechanisms and algorithms described in the article is given.
Keywords:concrete historical investigation, distributed technology, machine learning, data cleansing, data inconsistency.