RUS  ENG
Full version
JOURNALS // Sistemy i Sredstva Informatiki [Systems and Means of Informatics] // Archive

Sistemy i Sredstva Inform., 2023 Volume 33, Issue 3, Pages 149–160 (Mi ssi904)

This article is cited in 1 paper

Data cleansing in the technology of concrete historical investigation support

I. M. Adamovich, O. I. Volkov

Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119133, Russian Federation

Abstract: The article continues the series of works devoted to the technology of concrete historical research supporting. The technology is based on the principles of co-creation and crowdsourcing and is designed for a wide range of users which are not professional historians and biographers. The expediency of expanding the list of concrete historical research tasks solved within the framework of the described technology using machine learning methods is shown. The special importance of data preparation is noted due to the fragmentation and inconsistency of concrete historical information. This article is devoted to the specifics of concrete historical data cleansing and the analysis of the possibility of using mechanisms and algorithms already integrated into the technology for this purpose. The main directions in which data cleansing is carried out are listed. Suitable tools already included in the technology have been identified for each direction. Particular attention is paid to tools for eliminating inconsistencies. The stages of data cleansing are listed and the scheme of interaction of all mechanisms and algorithms described in the article is given.

Keywords: concrete historical investigation, distributed technology, machine learning, data cleansing, data inconsistency.

Received: 02.05.2023

DOI: 10.14357/08696527230313



© Steklov Math. Inst. of RAS, 2024