RUS  ENG
Full version
JOURNALS // Matematicheskaya Biologiya i Bioinformatika // Archive

Mat. Biolog. Bioinform., 2023 Volume 18, Issue 2, Pages 418–433 (Mi mbb527)

Bioinformatics

Performance analysis of cross-assembly of metatranscriptomic datasets in viral community studies

Yu. S. Bukin, A. N. Bondaryuk, T. V. Butina

Limnological Institute of the Siberian Branch of the RAS

Abstract: We conducted a comparative analysis of individual and cross-assemblies of several metatranscriptomic data sets to study viral communities using several metatranscriptomes of endemic Baikal mollusks. We have shown that, compared to individual dataset assemblies, a hidden Markov model-based cross-assembly procedure increases the number of viral contigs (or scaffolds) per sample, the number of virotypes identified, and the average length of scaffolds per sample. The proportion of assembled viral reads from the total number of reads in samples is higher in cross-assembly. De novo cross-genomic assemblies combined with a virus identification algorithm using HMM present the data in a table with the number of reads from different samples for each scaffold. The table allows comparison of samples based on the representation of all viral scaffolds, including those not taxonomically identified, i.e. those that have no analogues in the NCBI RefSeq database. Thus, cross-genomic assemblies allow for comparative analyzes taking into account the latent diversity of viruses. We propose a pipeline for metatranscriptomic data analysis using de novo cross-genomic assembly to study viral diversity.

Key words: metagenomics, transcriptomics, viruses, viral communities, metagenomic assembly, cross-assembly, metatranscriptomic analysis, viral scaffolds.

Received 05.11.2023, 19.11.2023, Published 20.11.2023

DOI: 10.17537/2023.18.418



© Steklov Math. Inst. of RAS, 2024