RUS  ENG
Full version
JOURNALS // Sistemy i Sredstva Informatiki [Systems and Means of Informatics] // Archive

Sistemy i Sredstva Inform., 2008 special issue, Pages 86–111 (Mi ssi158)

On linguistic classification of bacterial genomes

Zeev Volkovicha, Valery Kirzhnerb, Zeev Barzilya

a Software Engineering Department, ORT Braude College of Engineering, Karmiel, Israel
b Institute of Evolution, University of Haifa, Haifa, Israel

Abstract: The paper is devoted to classification of 185 full prokaryote genomes using a modification of the compositional spectra method. This modification suggests separate calculation of the compositional spectra for coding and non-coding subsequences of the genome. For each subsequence, the corresponding vector, in Euclidian space, can be obtained using certain manipulations of the compositional spectra. This allows analyzing the structure of genome and determining the most probable number of genome clusters without any additional information. Our clustering method is based on the application of the external indexes of partitions agreement and the number of the misclassified items within repeated partitions. A biological justification, for the four and the two letters alphabets, substantiates the appropriateness of the outcomes acquired.

Language: English



© Steklov Math. Inst. of RAS, 2024