RUS  ENG
Full version
JOURNALS // Vestnik Sankt-Peterburgskogo Universiteta. Seriya 10. Prikladnaya Matematika. Informatika. Protsessy Upravleniya // Archive

Vestnik S.-Petersburg Univ. Ser. 10. Prikl. Mat. Inform. Prots. Upr., 2021 Volume 17, Issue 4, Pages 389–396 (Mi vspui505)

Computer science

Research of features of Dostoevsky's publicistic style by using $n$-grams based on the materials of the “Time” and “Epoch” magazines

R. V. Abramov, K. A. Kulakov, A. A. Lebedev, N. D. Moskin, A. A. Rogov

Petrozavodsk State University, 33, pr. Lenina, Petrozavodsk, 185910, Russian Federation

Abstract: The paper is devoted to the study of the publicity style of F. M. Dostoevsky on the basis of publications in the journals “Time” and “Epoch” (1861–1865). For this, fragments of texts (including other authors: M. M. Dostoevsky, N. N. Strakhov, A. A. Golovachev, etc.) were selected in sizes of 500, 700 and 1000 words, on which the occurrence of bigrams and trigrams (encoded sequences of parts of speech) were counted. Decision trees were built on their basis and an analysis of the accuracy of text recognition was performed. If we consider the class cation at the rest level of the tree (fragment size 1000), then the accuracy was on average 87 resulting decision trees.

Keywords: publicity style, text attribution, decision tree, $n$-gram, F. M. Dostoevsky, information system “Statistical methods for analyzing literary texts”, tree matching.

UDC: 004.8

MSC: 68T50

Received: December 25, 2020
Accepted: October 13, 2021

Language: English

DOI: 10.21638/11701/spbu10.2021.407



© Steklov Math. Inst. of RAS, 2024