N. N. Nazipova, E. A. Isaev, V. V. Kornilov, D. V. Pervukhin, A. A. Morozova, A. A. Gorbunov, M. N. Ustinin, “Big Data in bioinformatics”, Mat. Biolog. Bioinform., 2017, Volume 12, Issue 1,Pages <nobr>102

This article is cited in 7 papers

Information and Computer Technologies in Biology and Medicine

Big Data in bioinformatics

N. N. Nazipova^a, E. A. Isaev^b, V. V. Kornilov^b, D. V. Pervukhin^b, A. A. Morozova^c, A. A. Gorbunov^b, M. N. Ustinin^a

^a Institute of Mathematical Problems of Biology RAS - the Branch of Keldysh Institute of Applied Mathematics of Russian Academy of Sciences
^b National Research University "Higher School of Economics"
^c The Union of Enterprises The Central Scientific and Production Association "CASCADE"

Abstract: Sequencing of the human genome began in 1994. It took 10 years of collaborative work of many teams in order to obtain a draft of human DNA. Modern technology of sequencing allows one to read the individual genomes in a few days. Advances in modern bioinformatics related to the emergence of high-performance sequencing platforms, which not only contributed to the expansion of the capabilities of biology and related sciences, but also gave rise to the phenomenon of large data. In the paper the necessity of development of new technologies and methods for organization of storage, management, analysis and visualization of large data is substantiated. Modern bioinformatics has faced not only the problem of enormous volumes of heterogenous data, but also with a huge variety of processing and presentation methods, the existence of various software tools and data formats. The ways of solving the arising challenges are discussed in the paper, in particular by using achievements from other areas of modern life, such as web intelligence and business intelligence. New storage systems, other than relational ones, will help to solve the problem of archiving and ensuring an acceptable time for performing search queries. New programming technologies, namely generic programming and visual programming can help to overcome the problem of diversity of formats of genomic data and provide the ability to experimentators to quickly create scripts for data processing.

Key words: Big Data, NGS, genome sequencing, IT technologies, bioinformatics, generic programming, visual programming, nonrelational databases, NoSQL systems, Hadoop, MapReduce.

UDC: 004.9:004.9:004.8:577.21

Received 21.12.2016, Published 10.03.2017

DOI: 10.17537/2017.12.102