N. N. Nazipova, E. A. Isaev, V. V. Kornilov, D. V. Pervukhin, A. A. Morozova, A. A. Gorbunov, M. N. Ustinin, “Big Data in bioinformatics”, Mat. Biolog. Bioinform., 2018, Volume 13, Issue Suppl.,Pages <nobr>t1

This article is cited in 10 papers

Translations of Published Articles

Big Data in bioinformatics

N. N. Nazipova^a, E. A. Isaev^b, V. V. Kornilov^b, D. V. Pervukhin^b, A. A. Morozova^c, A. A. Gorbunov^b, M. N. Ustinin^a

^a Institute of Mathematical Problems of Biology RAS - the Branch of Keldysh Institute of Applied Mathematics of Russian Academy of Sciences
^b National Research University "Higher School of Economics"
^c The Union of Enterprises the Central Scientific and Production Association "CASCADE"

Abstract: Sequencing of the human genome began in 1994. Revealing of a human DNA draft took 10 years of collaborative work of many research groups from different countries. Modern technologies allow for sequencing a whole genome in a few days. We discuss here the advances in modern bioinformatics related to the emergence of highperformance sequencing platforms, which not only contributed to the expansion of capabilities of biology and related sciences, but also gave rise to the phenomenon of Big Data in biology. The necessity for development of new technologies and methods for organization of storage, management, analysis and visualization of big data is substantiated. Modern bioinformatics is facing not only the problem of processing enormous volumes of heterogeneous data, but also a variety of methods of interpretation and presentation of the results, the simultaneous existence of various software tools and data formats. The ways of solving the arising challenges are discussed, in particular by using experiences from other areas of modern life, such as web and business intelligence. The former is the area of scientific research and development that explores the impact and makes use of artificial intelligence and information technology (IT) for new products, services and frameworks that are empowered by the World Wide Web; the latter is the domain of IT, which addresses the issues of decision-making. New database management systems, other than relational ones, will help to solve the problem of storing huge data and providing an acceptable timescale for performing search queries. New programming technologies, such as generic programming and visual programming, are designed to solve the problem of the diversity of genomic data formats and to provide the ability to quickly create one’s own scripts for data processing.

Key words: Big Data, NGS, genome sequencing, IT technologies, bioinformatics, generic programming, visual programming, nonrelational databases, NoSQL systems, Hadoop, MapReduce.

UDC: 004.9:004.9:004.8:577.21

Received 16.03.2018, Published 03.04.2018

Language: English

DOI: 10.17537/2018.13.t1