Abstract:
Analysis of the data from the first release of HeteroGenome database, collecting the revealed regions of latent periodicity in the genomes of a number of eukaryotic organisms, is presented. Tandem repeats with different conservation of a pattern copies, including the highly diverged repeats, were identified in the genomes of S. cerevisiae, A. thaliana, C. elegans and D. melanogaster. The data were obtained with the help of original spectral-statistical approach to searching for the reliable regions of latent periodicity in DNA sequences. Introduction of the two-level structure for the data presentation (At the first, nonredundant level the regions of latent periodicity is generally viewed, at the second level only the fragments of conservative periodic structure are considered.) allowed to estimate a share of genome coverage by the regions of latent periodicity which counts $\sim10\%$ of a whole genome length. The estimate is deduced according to the data of the first level. An analysis of quantitative and qualitative content (corresponding to the divergence levels) of the latent periodicity regions over all the chromosomes of the considered organisms revealed the characteristic types of periodicity in a genome of each organism. The histograms showing density distribution of the latent periodicity regions along every chromosome in the analyzed genomes were built. A repertoire of period lengths were revealed in the genomes. Moreover, HeteroGenome base offers some additional possibilities for its’ data analysis and is freely available at URL: http://www.jcbi.ru/lp_baze/.
Key words:latent periodicity, approximate tandem repeats, genome analysis.