RUS  ENG
Full version
JOURNALS // Sibirskie Èlektronnye Matematicheskie Izvestiya [Siberian Electronic Mathematical Reports] // Archive

Sib. Èlektron. Mat. Izv., 2019 Volume 16, Pages 1822–1832 (Mi semr1170)

This article is cited in 3 papers

Probability theory and mathematical statistics

A statistical test for the Zipf's law by deviations from the Heaps' law

M. G. Chebuninab, A. P. Kovalevskiicb

a Sobolev Institute of Mathematics, 4, Koptyuga ave., Novosibirsk, 630090, Russia
b Novosibirsk State University, 1, Pirogova str., Novosibirsk, 630090, Russia
c Novosibirsk State Technical University, 20, K. Marksa ave., 630073, Novosibirsk, Russia

Abstract: We explore a probabilistic model of an artistic text: words of the text are chosen independently of each other in accordance with a discrete probability distribution on an infinite dictionary. The words are enumerated 1, 2, $\ldots$, and the probability of appearing the $i$'th word is asymptotically a power function. Bahadur proved that in this case the number of different words as a function of the length of the text, again, asymptotically behaves like a power function. On the other hand, in the applied statistics community there are statements known as the Zipf’s and Heaps’ laws that are supported by empirical observations. We highlight the links between Bahadur results and Zipf's/Heaps' laws, and introduce and analyse a corresponding statistical test.

Keywords: Zipf's law, Heaps' law, weak convergence.

UDC: 519.233

MSC: 62F03

Received September 24, 2019, published December 4, 2019

Language: English

DOI: 10.33048/semi.2019.16.129



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2024