Abstract:
Genome-wide association studies play a key role in identifying relationships between genomes and phenotypes. Many studies in this field are devoted to the investigations of genetic variations and their interactions in genomes. However, despite significant progress achieved in this direction, the problem under consideration is still highly relevant and requires the development of effective methods and algorithms for solving it. In this paper, four new algorithms based on the study of single nucleotide polymorphism interactions in the two modes, additive and multiplicative, are proposed to find combinations of single nucleotide polymorphisms associated with phenotypes. In the first stage, the algorithms use exhaustive search of single nucleotide polymorphism pairs to predict their association with phenotype, and in the second stage, greedy procedures are applied to find combinations of up to five single nucleotide polymorphisms with the best association values. The developed computational approach is tested on the dataset containing $3178$ Mycobacterium tuberculosis genomes to identify single nucleotide polymorphism combinations and predict resistance of Mycobacte rium tubercu losis strains to $20$ drugs. The results obtained are compared with those of the modern prediction software systems $Mykrobe$ and $TBPro$ filer. For the $5$ first-line drugs and the 1 second-line drug (Ofloxacin), $Mykrobe$ and $TBPro$ filer systems slightly exceed the prediction accuracy of the proposed algorithms, but for the other $14$ second-line drugs, they are inferior to them.
Keywords:genome-wide association studies; GWAS; single nucleotide polymorphism interaction; drug-resistant tuberculosis