RUS  ENG
Full version
JOURNALS // Vestnik Sankt-Peterburgskogo Universiteta. Seriya 10. Prikladnaya Matematika. Informatika. Protsessy Upravleniya // Archive

Vestnik S.-Petersburg Univ. Ser. 10. Prikl. Mat. Inform. Prots. Upr., 2024 Volume 20, Issue 3, Pages 391–403 (Mi vspui634)

Computer science

Extending the applicability of the Zipf's laws to the sequences of byte data

S. L. Sergeev, I. S. Blekanov, F. V. Ezhov, N. A. Tarasov

St. Petersburg State University, 7-9, Universitetskaya nab., St. Petersburg, 199034, Russian Federation

Abstract: Zipf's law have been shown to hold true in many places. From it's first idea of a statistical phenomenon related to natural language to it's later adaptations for economical, social and many other fields, it has been shown to work almost universally. In all of these cases authors discuss the applicability of the Zipf's law in terms of semantically complex structures. We take this notion a step further and show how this law can work for data analysis, in particular for the sequences of byte data, obtained from various sources. We show that, using the basic chunking methodology, the Zipf's law can be shown to hold true for many different types of raw sequences of byte data. In particular, the law holds true in all caes for the "middle point’’ of data, where it is present with a degree of certainty of more than 90 %. We conclude by discussing the implications and potential use cases of these findings.

Keywords: Zipf's laws, byte data, chunking, frequency analysis.

UDC: 004.93

MSC: 93B03

Received: May 19, 2024
Accepted: June 25, 2024

Language: English

DOI: 10.21638/spbu10.2024.307



© Steklov Math. Inst. of RAS, 2025