Specialized issue based on the materials of the V International Conference ''Physics - life Sciences'' St. Petersburg, October 15-19, 2023 Physical methods in agro- and genetic breeding technologies
Construction of a nucleotide sequence using machine learning methods in the “NANOFOR SPS” sequencer
Abstract:
The development of mathematical methods and information technologies for data processing plays an essential role in establishing various features in the analyzed nucleic acids and is a necessary element in the development and improvement of instruments and devices for practical use in biology and medicine. The technology of mass parallel sequencing of nucleic acids includes the process of measuring the intensities of fluorescence signals based on mathematical processing of images obtained from video cameras, and then constructing a sequence of nucleotides based on the results of these measurements. The paper considers the methods of information processing, which are divided into two parts. The first part includes methods for filtering images, detecting fluorescence clusters, and evaluating the parameters of fluorescence signals, both for single clusters and for clusters “superimposed” on each other. The second part of the information processing methods considered in this work includes methods for constructing a sequence of letter codes of DNA nucleotides based on the intensities of fluorescence signals obtained directly from the results of image processing. No adjustments have been made to such signals related to intensity changes due to phenomena such as Phasing/Prephasing, signal attenuation and Cross-talk. These methods use classifiers based on machine learning. It is shown that as a result of the performed approbation of various machine learning models for the task of constructing a sequence of nucleotides, the results obtained showed sufficiently high quality indicators of genetic analysis. The quality indicators of the Phred score were in the range from 29 to 35 for the reference genome of the bacteriophage Phix174.
Keywords:sequencing, nucleic acids, image processing, improving the quality of genetic analysis, machine learning.