RUS  ENG
Full version
JOURNALS // Matematicheskaya Biologiya i Bioinformatika // Archive

Mat. Biolog. Bioinform., 2018 Volume 13, Issue 1, Pages 159–168 (Mi mbb331)

Bioinformatics

New procedure of raw Illumina MiSeq data filtering for the amplicon metagenomic libraries

Yu. S. Bukinab, L. S. Buzolevacd, Yu. S. Golozubovad, Yu. P. Galachyantsba

a Irkutsk Scientific Center, Siberian Branch of the Russian Academy of Sciences, Russia
b Limnological Institute, Siberian Branch of the Russian Academy of Sciences, Irkutsk, Russia
c Somov Institute of Epidemiology and Microbiology, Vladivostok, Russia
d Far Eastern Federal University, Vladivostok, Russia

Abstract: In this paper we present an algorithm to filter amplicon paired-end NGS raw data which is used to capture genetic and taxonomic diversity of communities of unicellular microorganisms. The suggested approach allows one to overcome the issue of massive data loss during filtration of raw sequences and increases the static representativeness of analyzed amplicons. Furthermore, an unequal elimination of sequences belonging to different taxonomic groups was shown to occur if one applies standard trimming methods based on filtration of quality of raw reads, for instance, using sliding window approach. This bias may result in a skew of taxon counts and depletion of taxonomic diversity of analyzed communities. The suggested method does not introduce the errors of this kind. The implementation of the algorithm in R as well as a number of example files for analysis is available at https://github.com/barnsys/metagenomic_analysis.

Key words: amplicon metagenomic, new generation sequencing, meta-barcoding, quality control.

UDC: 573, 579, 579.8

Received 02.10.2017, Published 15.05.2018

Language: English

DOI: 10.17537/2018.13.159



© Steklov Math. Inst. of RAS, 2024