RUS  ENG
Full version
JOURNALS // Sistemy i Sredstva Informatiki [Systems and Means of Informatics] // Archive

Sistemy i Sredstva Inform., 2023 Volume 33, Issue 4, Pages 115–125 (Mi ssi916)

Graph $n$-grams in the text attribution problem

N. D. Moskin, A. A. Rogov, A. A. Lebedev

Petrozavodsk State University, 33 Lenina Prosp., Petrozavodsk 185910, Russian Federation

Abstract: The paper presents the results of research in the field of modeling the structure of texts using a generalized context-dependent graph-theoretic model. The object of the study is mainly literary and folklore texts for which the task of attribution arises. For example, there are many such texts in the works of the famous Russian writer F. M. Dostoevsky. The authors show how it is possible to build hybrid models based on dependency trees, graph models of syntactic structure of links between simple sentences in a multicomponent complex sentence, and “strong links” graphs of word combinations of different grammatical classes. Such models make it possible to construct new informative features that are potentially applicable in the attribution of texts. An example is the frequency of occurrence of graph $n$-grams which are generalizations of ordinary $n$-grams syntactic $n$-grams, and other similar constructions used in stylistic studies. The article also discusses the format for storing texts, their generalized graph models, and graph $n$-grams.

Keywords: artificial intelligence, text attribution, graph, metagraph, hybrid graph, folklore text, literary text, graph $n$-gram.

Received: 01.07.2023

DOI: 10.14357/08696527230411



© Steklov Math. Inst. of RAS, 2024