Pré-Publication, Document De Travail Année : 2025

CroCoDeEL: accurate control-free detection of cross-sample contamination in metagenomic data

Résumé

Metagenomic sequencing provides profound insights into microbial communities, but it is often compromised by technical biases, including cross-sample contamination. This underexplored phenomenon arises when microbial content is inadvertently exchanged among concurrently processed samples. Such contamination that distort microbial profiles, poses significant risks to the reliability of metagenomic data and downstream analyses. Despite its critical impact, this issue remains insufficiently addressed. To fill this gap, we introduce CroCoDeEL, a decision-support tool for detecting and quantifying cross-sample contamination. Leveraging a pre-trained supervised model, CroCoDeEL identifies contamination patterns in species abundance profiles with high accuracy. Unlike existing tools, it requires no negative controls or prior knowledge of sample processing positions, offering improved accuracy and versatility. Benchmarks across three public datasets demonstrate that CroCoDeEL accurately detects contaminated samples and identifies their contamination sources, even at low rates (<0.1%), provided sufficient sequencing depth. Our findings suggest that cross-sample contamination is prevalent in metagenomics and emphasize the necessity of systematically integrating contamination detection into sequencing data quality control.

Recherche Data Gouv

Cite 10.57745/N6JSHQ Jeu de données GOULET, Lindsay; PLAZA ONATE, Florian; FAMECHON, Alexandre; QUINQUIS, Benoît; BELDA, Eugeni; PRIFTI, Edi; LE CHATELIER, Emmanuelle; GAUTREAU, Guillaume, 2025, "CroCoDeEL : training, validation and test datasets", https://doi.org/10.57745/N6JSHQ, Recherche Data Gouv, V1, UNF:6:lRPTSuZudsgZuwVJfWzD2A== [fileUNF]

Dates et versions

hal-04902208 , version 1 (20-01-2025)

Identifiants

Citer

Lindsay Goulet, Florian Plaza Oñate, Alexandre Famechon, Benoît Quinquis, Eugeni Belda, et al.. CroCoDeEL: accurate control-free detection of cross-sample contamination in metagenomic data. 2025. ⟨hal-04902208⟩
163 Consultations
0 Téléchargements

Altmetric

Partager

  • More