Abstract:
Given two equally long, uniformly random binary strings, the expected length of their longest common subsequence (LCS) is asymptotically proportional to the strings' length. Finding the proportionality coefficient $\gamma$, i.e., the limit of the normalised LCS length for two random binary strings of length $n \to \infty$, is a natural problem, first posed by Chvátal and Sankoff in 1975, and as yet unresolved. This problem has relevance to diverse fields ranging from combinatorics and algorithm analysis to coding theory and computational biology. In a previous paper [47], we used methods of statistical mechanics, as well as some existing results on the combinatorial structure of LCS, to link constant $\gamma$ to the parameters of a certain stochastic particle process. Here, we complement this analysis by presenting a formulation of the problem in the language of symbolic dynamics and cellular automata, and reporting some preliminary results of a computational experiment aimed at improving the existing numerical estimates for $\gamma$. We also point out an error in the previous paper [47], which invalidates some of its claims on the properties of $\gamma$.
Key words and phrases:random strings, longest common subsequence, the Chvátal–Sankoff problem, particle processes, symbolic dynamics, cellular automata.