RUS  ENG
Full version
JOURNALS // Sistemy i Sredstva Informatiki [Systems and Means of Informatics] // Archive

Sistemy i Sredstva Inform., 2024 Volume 34, Issue 2, Pages 123–133 (Mi ssi940)

Method for searching for optimal parameter values of the entity resolution algorithm for concrete historical data

I. M. Adamovich, O. I. Volkov

Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119133, Russian Federation

Abstract: The article is devoted to the use of the collective entity resolution method based on a new relational clustering algorithm, which is a modification of the greedy agglomerative clustering algorithm, in concrete historical investigation when processing nominative sources. The article proposes the method for searching for optimal values of parameters of the collective entity resolution algorithm for tasks related to concrete historical investigation. The method is based on the analysis of the specifics of concrete historical data, their comparison with test data for which there are estimates of the effectiveness of the algorithm, and the procedure for finding the optimal process parameters according to the Gauss–Seidel scheme that consists in sequentially searching for the function optimum alternately for each variable. The application of the proposed method makes it possible to use the considered entity resolution algorithm in real concrete historical research in the tasks of automated record linkage in nominative sources.

Keywords: concrete historical investigation, distributed technology, entity resolution, algorithm parameters, relational similarity measure.

Received: 15.03.2024

DOI: 10.14357/08696527240209



© Steklov Math. Inst. of RAS, 2024