Abstract:
We develop and study the concept of similarity functions for $q$-ary sequences.
For the case $q=4$, these functions can be used for a mathematical model of the DNA duplex
energy [1, 2], which has a number of applications in molecular biology. Based on these similarity
functions, we define a concept of DNA codes [1]. We give brief proofs for some of our unpublished
results [3] connected with the well-known deletion similarity function [4–6]. This function is
the length of the longest common subsequence; it is used in the theory of codes that correct
insertions and deletions [5]. Principal results of the present paper concern another function,
called the similarity of blocks. The difference between this function and the deletion similarity
is that the common subsequences under consideration should satisfy an additional biologically
motivated [2] block condition, so that not all common subsequences are admissible. We prove
some lower bounds on the size of an optimal DNA code for the block similarity function. We also
consider a construction of close-to-optimal DNA codes which are subcodes of the parity-check
one-error-detecting code in the Hamming metric [7].