Abstract:
Calculation of substrings ($N<12$) quantity in the nucleotide sequences of the first and second DNA strands of prokaryotic and eukaryotic genomes form the arrays of data with high degrees of the linear Pearson correlation. Changing the algorithm of complementary strands generation (second from the first) or decreasing the length of genomic DNA fragments being analyzed causes the correlations to extinct. The high degree of symmetry of the first and second strands of genomes is explained by presence of the large number of complementary divided palindromes, which may be considered as the form of perfect, inverted, diverse repeating regions. The appearance of the discussed complementary chains symmetry in the pseudorandom DNA matrix is shown via the computer model, which works on the random repeats generation algorithm and long deletions, which ensure the constancy of the nucleotide text length being transformed in silico.
Authors sincerely thanks collaborators from Biology institute of Ufa Scientific Center of RAS where the main part of this work was done and participants of school “Future of Applied Mathematics” for support and discussion.