RUS  ENG
Full version
JOURNALS // Zhurnal Tekhnicheskoi Fiziki // Archive

Zhurnal Tekhnicheskoi Fiziki, 2024 Volume 94, Issue 9, Pages 1551–1560 (Mi jtf6862)

Specialized issue based on the materials of the V International Conference ''Physics - life Sciences'' St. Petersburg, October 15-19, 2023
Physical methods in agro- and genetic breeding technologies

Construction of a nucleotide sequence using machine learning methods in the “NANOFOR SPS” sequencer

V. V. Manoilov, A. G. Borodinov, A. I. Petrov, I. V. Zarutsky, B. V. Bardin, A. Yu. Yamanovskaya, A. S. Saraev, V. E. Kurochkin

Institute for Analytical Instrumentation, Russian Academy of Sciences, St. Petersburg

Abstract: The development of mathematical methods and information technologies for data processing plays an essential role in establishing various features in the analyzed nucleic acids and is a necessary element in the development and improvement of instruments and devices for practical use in biology and medicine. The technology of mass parallel sequencing of nucleic acids includes the process of measuring the intensities of fluorescence signals based on mathematical processing of images obtained from video cameras, and then constructing a sequence of nucleotides based on the results of these measurements. The paper considers the methods of information processing, which are divided into two parts. The first part includes methods for filtering images, detecting fluorescence clusters, and evaluating the parameters of fluorescence signals, both for single clusters and for clusters “superimposed” on each other. The second part of the information processing methods considered in this work includes methods for constructing a sequence of letter codes of DNA nucleotides based on the intensities of fluorescence signals obtained directly from the results of image processing. No adjustments have been made to such signals related to intensity changes due to phenomena such as Phasing/Prephasing, signal attenuation and Cross-talk. These methods use classifiers based on machine learning. It is shown that as a result of the performed approbation of various machine learning models for the task of constructing a sequence of nucleotides, the results obtained showed sufficiently high quality indicators of genetic analysis. The quality indicators of the Phred score were in the range from 29 to 35 for the reference genome of the bacteriophage Phix174.

Keywords: sequencing, nucleic acids, image processing, improving the quality of genetic analysis, machine learning.

Received: 12.02.2024
Revised: 14.06.2024
Accepted: 08.07.2024

DOI: 10.61011/JTF.2024.09.58677.35-24



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2025