RUS  ENG
Full version
JOURNALS // Modelirovanie i Analiz Informatsionnykh Sistem // Archive

Model. Anal. Inform. Sist., 2020 Volume 27, Number 3, Pages 330–343 (Mi mais719)

Theory of data

Automated search and analysis of the stylometric features that describe the style of the prose 19th-21st centuries

K. V. Lagutina, A. M. Manakhova

P. G. Demidov Yaroslavl State University, Sovetskaya str., 14, Yaroslavl, 150003, Russia

Abstract: The article is devoted to comparison of stylometric features of several levels, which are markers of the style of the prose text and analysis of the stylistic changes in Russian and British prose of the 19th–21st centuries. Stylometric features include the low-level features based on the words and symbols and high-level based on rhythmic. These features model the style of a text and are the indicators of the time when the text was created.
Calculations of all the features are performed completely automatically, so it allows to conduct the large-scale experiments with artworks of a large volume and speeds up the work of a linguist. To calculate the stylometric features including ones based on the search results for rhythmic figures the ProseRhythmDetector program is used. As a result of its work, each text is presented as a set of the same features of three levels: characters, words, rhythm. Texts are combined by decades, for each decade there are found average values of stylometric features. The obtained models of decades are compared using standard similarity metrics, results of comparison are visualized in the form of the heat maps and dendrograms. Experiments with two corpora of Russian and British texts show that during the 19th–21st centuries there are general trends in style change for both corpora, for example, a decrease in the number of rhythmic figures per sentence, and also particular trends for each language, for example, dynamics of change of the word and sentence lengths. Stylometric features of all levels reveal the similarity in the style of texts published in one century. Also, features of three levels in the complex better demonstrate the uniqueness of each decade than features of a particular level. This study shows the importance of stylometric features as style markers of the different eras and allows us to identify trends in style during several centuries.

Keywords: text rhythm, rhythm analysis, natural language processing, stylometry, rhythm figures, automation.

UDC: 004.912

MSC: 68T50

Received: 14.05.2020
Revised: 08.06.2020
Accepted: 10.06.2020

DOI: 10.18255/1818-1015-2020-3-330-343



© Steklov Math. Inst. of RAS, 2024