RUS  ENG
Full version
JOURNALS // Program Systems: Theory and Applications // Archive

Program Systems: Theory and Applications, 2016 Volume 7, Issue 1, Pages 201–208 (Mi ps207)

This article is cited in 2 papers

Mathematical Foundations of Programming

A picture of common subsequence length for two random strings over an alphabet of 4 symbols

S. V. Znamenskij

Ailamazyan Program System Institute of RAS

Abstract: The maximal length of longest common subsequence (LCS) for a couple of random finite sequences over an alphabet of 4 characters was considered as a random function of the sequences lengths $m$ and $n$. Exact probability distributions tables are presented for all couples of length in a range $2<m+n<19$.
The graphs of expected value and standard deviation as a functions of length are shown in linear perspective which presents the behaviour of large lengths at the horizon. In order to illustrate behaviour on large lengths, the results of numeric simulation for $m+n=32$, 512, 8192 and 131072 are also shown on the same graphs. The presented graph of expected value dependency of $m$ and $n$ looks to have asymptotic right circular cone. The variance looks alike growing as $(n+m)^{\frac34}$.

Key words and phrases: similarity of strings, sequence alignment, edit distance, LCS, Levenshtein metric.

UDC: 004.416

Received: 25.12.2015
Accepted: 28.03.2016

Language: English



© Steklov Math. Inst. of RAS, 2025