RUS  ENG
Full version
JOURNALS // Journal of the Belarusian State University. Mathematics and Informatics // Archive

Journal of the Belarusian State University. Mathematics and Informatics, 2024 Volume 2, Pages 104–112 (Mi bgumi690)

Theoretical foundations of computer science

Simulation modelling of single nucleotide genetic polymorphisms

N. N. Yatskoua, V. V. Apanasovichb, V. V. Grineva

a Belarusian State University, 4 Niezaliezhnasci Avenue, Minsk 220030, Belarus
b Independent researcher, Minsk, Belarus

Abstract: We propose an approach for the identification of single nucleotide polymorphisms (SNPs) in DNA sequences, based on the simulation modelling of sites of single nucleotides using the generation of random events according to the beta or normal distributions, the parameters of which are estimated from the available experimental data. The developed approach improves the accuracy of determining SNPs in DNA molecules and permits to investigate the reliability of specific experiments as well as to estimate the errors of determination of the parameters obtained in real experimental conditions. The verification of the simulation model and analysis methods is carried out on a set of reference human genomic DNA sequencing data provided by the Genome in a Bottle Consortium. The comparative analysis of the existing statistical SNP identification algorithms and machine learning methods, trained on the simulated data from the genomic sequencing of human DNA molecules, is carried out. The best results are obtained for machine learning models, in which the accuracy of SNP identification is $2-5 \%$ higher than for classical statistical methods.

Keywords: single nucleotide polymorphism; SNP; SNP identification; simulation modelling; machine learning

UDC: 57.087.1

Received: 22.01.2024
Revised: 01.07.2024
Accepted: 01.07.2024



© Steklov Math. Inst. of RAS, 2024