Abstract:
In this paper we present an algorithm to filter amplicon paired-end NGS raw data which is used to capture genetic and taxonomic diversity of communities of unicellular microorganisms. The suggested approach allows one to overcome the issue of massive data loss during filtration of raw sequences and increases the static representativeness of analyzed amplicons. Furthermore, an unequal elimination of sequences belonging to different taxonomic groups was shown to occur if one applies standard trimming methods based on filtration of quality of raw reads, for instance, using sliding window approach. This bias may result in a skew of taxon counts and depletion of taxonomic diversity of analyzed communities. The suggested method does not introduce the errors of this kind. The implementation of the algorithm in R as well as a number of example files for analysis is available at https://github.com/barnsys/metagenomic_analysis.
Key words:amplicon metagenomic, new generation sequencing, meta-barcoding, quality control.