Full version
JOURNALS // Preprints of the Keldysh Institute of Applied Mathematics // Archive

Keldysh Institute preprints, 2017 032, 21 pp. (Mi ipmp2248)

Statistical text language recognition with the use of $n$-gram frequency

Yu. N. Orlov, S. A. Shilin

Abstract: Statistical properties of European language texts are investigated with the use of recognition procedure for $n$-gram distribution patterns. The numerical algorithm is constructed for analysis Hurst exponent for letter distance distributions of the text fragment. The accuracy of binary recognition is estimated as 0,99.

Keywords: text language recognition, $n$-gram frequency.

DOI: 10.20948/prepr-2017-32

© Steklov Math. Inst. of RAS, 2025