Abstract:
The article continues the series of works devoted to the technology of concrete historical research support. The technology is based on the principles of co-creation and crowdsourcing and is designed for a wide range of users which are not professional historians and biographers. The article is devoted to the further development of the technology by integrating into it a mechanism for automated search for anomalies in concrete historical information based on cluster analysis. The analysis of the specifics of concrete historical data and the ways of their representation in the object model of technology is carried out. The methods of mixed data digitizing and the proximity measures used for them are considered in detail and the advantages and disadvantages of clustering algorithms used to search for anomalies are evaluated. Based on the analysis, an approach was developed to search for anomalies in the data of technology and directions were outlined for testing the effectiveness of the selected algorithms and proximity measures on real concrete historical data.