Abstract:
Modern mathematical methods for protein analysis, such as database search and de novo methods, have their own drawbacks. It is not possible to identify proteins that are not included in databases using database search. The de novo methods allow us to identify new proteins but they are very computationally demanding (requiring the use of a supercomputer). In this project a complex approach of approximate protein analysis conducted on a personal computer was developed. A problem of qualitative and quantitative determination of initial sequence (protein) consists of three subproblems. The first one is noise cancellation and peak identification using mass spectrometry data. An algorithm combining a sliding average method and computational photography HDR technology was developed. The second subproblem is peak identification. It was reduced to a knapsack problem and solved using the branch and bound method. The last subproblem is initial sequence reconstruction using a set of fragments (peaks and their intensities). This subproblem was solved by constructing double trees and searching for a path of maximum length. All calculations were performed on a PC using CUDA parallel computing technology.
Keywords:proteomics, the knapsack problem, the branch and bound method, parallel computing.
UDC:519.85 BBK:
22.176
Received: November 9, 2021 Published: January 31, 2022