RUS  ENG
Full version
JOURNALS // Sistemy i Sredstva Informatiki [Systems and Means of Informatics] // Archive

Sistemy i Sredstva Inform., 2024 Volume 34, Issue 1, Pages 128–138 (Mi ssi930)

This article is cited in 1 paper

Collective entity resolution in technology of concrete historical investigation support

I. M. Adamovich, O. I. Volkov

Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119133, Russian Federation

Abstract: The article is devoted to the further development of a distributed technology of concrete historical investigation support based on the principles of crowdsourcing and focused on a wide range of users who are not professional historians and biographers. Development is carried out by including in the technology an entity resolution algorithm for nominative documents processing that performs collective resolution in which entities for matching links are determined jointly. This algorithm is a modification of the greedy agglomerative clustering algorithm. The article provides a detailed description of the approach underlying the algorithm and provides its high-level pseudocode. The analysis of its effectiveness on data with varying degrees of ambiguity of names is given and the degree of ambiguity of names of concrete historical data is estimated. The conclusion about the expediency of including the algorithm in the technology is made. The directions of further research on determining the configurable parameters of the algorithm are outlined.

Keywords: concrete historical investigation, distributed technology, entity resolution, greedy algorithm, relational similarity measure.

Received: 09.01.2024

DOI: 10.14357/08696527240111



© Steklov Math. Inst. of RAS, 2024