Benchmarking of virome metagenomic analysis approaches using a large, 60+ members, viral synthetic community
Abstract
In contrast to microbial metagenomics, there has still been only limited efforts to benchmark performance of virome analysis approaches in terms of faithfulness to community structure and of completeness of virome description. While natural communities are more readily accessible, synthetic communities assembled using well-characterized isolates allow more accurate performance evaluation. Starting from authenticated, quality-controlled reference isolates from the DSMZ Plant Virus Collection, we have assembled synthetic communities of varying complexity up to a highly complex community of 72 viral agents (115 viral molecules) comprising isolates from 21 viral families and 61 genera. These communities were then analyzed using two approaches frequently used in ecology-oriented plant virus metagenomics: a virion-associated nucleic acids (VANA)-based strategy and a highly purified double-stranded RNAs (dsRNAs)-based one. The results obtained allowed to compare diagnostic sensitivity of these two approaches for groups of viruses and satellites with different genome types and confirmed that the dsRNA-based approach provides a more complete representation of the RNA virome. However, for viromes of low to medium complexity, VANA appears a reasonable alternative and would be the preferred choiceif analysis of DNA viruses is of importance. They also allowed to identify several important parameters and to propose hypotheses to explain differences in performance, in particular, differences in the imbalance in the representation of individual viruses using each approach. Remarkably, these analyses highlight a strong direct relationship between the completeness of virome description and sample sequencing depth which should prove useful in further virome analysis efforts. IMPORTANCE We report here efforts to benchmark performance of two widespread approaches for virome analysis, which target either virion-associated nucleic acids (VANA) or highly purified double-stranded RNAs (dsRNAs). This was achieved using synthetic communities of varying complexity levels, up to a highly complex community of 72 viral agents (115 viral molecules) comprising isolates from 21 families and 61 genera of plant viruses. The results obtained confirm that the dsRNA-based approach provides a more complete representation of the RNA virome, in particular, for high complexity ones. However, for viromes of low to medium complexity, VANA appears a reasonable alternative and would be the preferred choice if analysis of DNA viruses is of importance. Several parameters impacting performance were identified as well as a direct relationship between the completeness of virome description and sample sequencing depth. The strategy, results, and tools used here should prove useful in a range of virome analysis efforts.
Fichier principal
2023-Schonegger-Journal_of_Virology-postprint.pdf (2.63 Mo)
Télécharger le fichier
Origin | Files produced by the author(s) |
---|---|
Licence |