Abstract:
The expected value $E$ of the longest common subsequence of letters in two random words is considered as a function of the $\alpha=|A|$ of alphabet and of words lengths $m$ and $n$.
It is assumed that each letter independently appears at any position with equal probability.
An approximate analitic expression for $E(\alpha, m, n)$ calculation is presented that allow to calculate the $E (m, n, \alpha) $ with an accuracy of $0.3$ percent for $ 64 \leqslant m + n \leqslant 65,536 $ and $ 1 <\alpha < 129$.
The coefficients are selected by hand and can be refined.
It is expected that the formula holds for each grater values of the argument with the same relative error.
Key words and phrases:similarity of strings, sequence alignment, edit distance, LCS, Levenshtein metric.