Abstract:
The article is devoted to the further development of a distributed technology of concrete historical investigation support based on the principles of crowdsourcing and focused on a wide range of users who are not professional historians and biographers. Development is carried out by including in the technology an entity resolution algorithm for nominative documents processing that performs collective resolution in which entities for matching links are determined jointly. This algorithm is a modification of the greedy agglomerative clustering algorithm. The article provides a detailed description of the approach underlying the algorithm and provides its high-level pseudocode. The analysis of its effectiveness on data with varying degrees of ambiguity of names is given and the degree of ambiguity of names of concrete historical data is estimated. The conclusion about the expediency of including the algorithm in the technology is made. The directions of further research on determining the configurable parameters of the algorithm are outlined.