Abstract:
A model is proposed describing the ranked frequency series of the letters in a language. According to this model, the rth frequency in the ordered series, $p(r)$, is approximately given by
$$
(1/n)(1/r+1/(r+1)+\dots+1/n)\approx(1/n)(\ln(n+1)-\ln r),
$$
where $n$ is the total number of letters in the language. The frequency distribution of the letters in the Russian language fits this model.