RUS  ENG
Full version
JOURNALS // Sistemy i Sredstva Informatiki [Systems and Means of Informatics] // Archive

Sistemy i Sredstva Inform., 2017 Volume 27, Issue 1, Pages 100–107 (Mi ssi505)

This article is cited in 2 papers

On the main types of relatedness between text documents

M. M. Charnine, N. V. Somin

Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: This paper considers the question of relatedness of natural language texts based on textual features (fragments). Two types of relatedness are revealed: first, explicit relatedness, when the texts are linked by bibliographic references, and, second, implicit relatedness, when the texts are linked through common text fragments. The advantages and applications of implicit relatedness are discussed. It is shown that the use of implicit relatedness increases the scope of text processing techniques based on relatedness of texts significantly. Measures of explicit and implicit relatedness are proposed. An experiment was conducted on a set of texts from the subject area of “computer graphics”. On the basis of the experiment, it was shown that both types of relatedness are correlated with each other. The authors found the parameters of text processing when the correlation was at maximum and reached about 55%. The plan for further development of the proposed method of texts comparison and refinement of the results is suggested.

Keywords: relatedness between texts; explicit relatedness; implicit relatedness; measure of relatedness; collection of texts; correlation.

Received: 29.10.2016

DOI: 10.14357/08696527170107



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2024