RUS  ENG
Full version
JOURNALS // Sistemy i Sredstva Informatiki [Systems and Means of Informatics] // Archive

Sistemy i Sredstva Inform., 2014 Volume 24, Issue 1, Pages 224–243 (Mi ssi339)

This article is cited in 3 papers

The tasks of identification of informational objects in area-spread data arrays

M. M. Gershkovich, T. K. Birukova

Institute of Informatics Problems, Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: An approach for identification of informational objects (IO) in automatic informational systems employed for data collection, storage, and processing is presented. Information systems consist of multiple nodes and acquire data from multiple sources. In majority of cases, a data array of informational systems is presented as continuously filled event's diary. Each event's record includes characteristics of the event's participant — IO — and of the event's conditions. In order to solve analytical problems related to IO, one should identify IO, i. e., define the array of IOs that are, with certain probability, the same entity. The paper defines typical IO identification tasks for elaboration of large-scale informational systems: IO fusion and IO clustering — forming an aggregate of IOs similar with respect to certain criteria. The identification task is closely connected to the task of identification of links between IOs, as the probability of IO's identity is higher if each IO is associated with another object. The methods for solving these tasks are presented, special features of IO identification in the flow of events are studied, and the correlation search method for detection of associations between IOs is described. The method for comparison of proper names considering probable distortions (phonetic and transcriptional) and misprints is presented. The efficacy of simultaneous Cyrillic and Latin first name – second name blocks application for personal identification is substantiated and the methods for translation from Cyrillic to Latin and vice versa are presented.

Keywords: identification of informational objects; identification of objects; correlation search; search for associations; identity of objects; fusion of informational objects; fusion of objects; text attributes; data distortions; phonetic distortions; transcriptional errors; Latin to Cyrillic transcription; Cyrillic to Latin transcription; Metaphone; Levenstein's distance; spread systems; area-spread systems; hierarchical systems; flow of events.

Received: 26.02.2014

DOI: 10.14357/08696527140114



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2024