PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing

Gleb Goussarov; Ilse Cleenwerck; Mohamed Mysara; Natalie Leys; Pieter Monsieurs; Guillaume Tahon; Aurélien Carlier; Peter Vandamme; Rob van Houdt

doi:10.1093/bioinformatics/btz964

Article Dans Une Revue Bioinformatics Année : 2020

PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing

(1, 2) , (2) , (1) , (1) , (1, 3) , (2, 4) , (5) , (2) , (1)

1
2
3
4
5

Gleb Goussarov

Fonction : Auteur

Belgian Nuclear Research Centre [Mol, Belgique]

Universiteit Gent = Ghent University = Université de Gand

Ilse Cleenwerck

Fonction : Auteur

Universiteit Gent = Ghent University = Université de Gand

Mohamed Mysara

Fonction : Auteur

Belgian Nuclear Research Centre [Mol, Belgique]

Natalie Leys

Fonction : Auteur

Belgian Nuclear Research Centre [Mol, Belgique]

Pieter Monsieurs

Fonction : Auteur

Belgian Nuclear Research Centre [Mol, Belgique]

Institute of Tropical Medicine [Antwerp]

Guillaume Tahon

Fonction : Auteur

Universiteit Gent = Ghent University = Université de Gand

Wageningen University and Research [Wageningen]

Aurélien Carlier

Fonction : Auteur
PersonId : 737686
IdHAL : aurelien-carlier
ORCID : 0000-0001-7565-1586
IdRef : 240121198

Laboratoire des Interactions Plantes Microbes Environnement

Peter Vandamme

Fonction : Auteur

Universiteit Gent = Ghent University = Université de Gand

Rob van Houdt

Fonction : Auteur

Belgian Nuclear Research Centre [Mol, Belgique]

Résumé

Motivation: One of the most widespread methods used in taxonomy studies to distinguish between strains or taxa is the calculation of average nucleotide identity. It requires a computationally expensive alignment step and is therefore not suitable for large-scale comparisons. Short oligonucleotide-based methods do offer a faster alternative but at the expense of accuracy. Here, we aim to address this shortcoming by providing a software that implements a novel method based on short-oligonucleotide frequencies to compute inter-genomic distances. Results: Our tetranucleotide and hexanucleotide implementations, which were optimized based on a taxonomically well-defined set of over 200 newly sequenced bacterial genomes, are as accurate as the short oligonucleotide-based method TETRA and average nucleotide identity, for identifying bacterial species and strains, respectively. Moreover, the lightweight nature of this method makes it applicable for large-scale analyses. Availability and implementation: The method introduced here was implemented, together with other existing methods , in a dependency-free software written in C, GenDisCal, available as source code from https://github.com/LM-UGent/GenDisCal. The software supports multithreading and has been tested on Windows and Linux (CentOS). In addition, a Java-based graphical user interface that acts as a wrapper for the software is also available.

Domaines

Bio-Informatique, Biologie Systémique [q-bio.QM]

Fichier principal

2020_goussarov_bioinformatics.pdf (6.24 Mo)

Origine	Fichiers éditeurs autorisés sur une archive ouverte

Aurélien Carlier : Connectez-vous pour contacter le contributeur

https://hal.inrae.fr/hal-02961993

Soumis le : mercredi 14 octobre 2020-10:53:47

Dernière modification le : mercredi 30 octobre 2024-20:42:05

Dates et versions

hal-02961993 , version 1 (14-10-2020)

Licence

Paternité - Pas d'utilisation commerciale

Identifiants

HAL Id : hal-02961993 , version 1
DOI : 10.1093/bioinformatics/btz964
PUBMED : 31899493
WOS : 000537473400003

Citer

Gleb Goussarov, Ilse Cleenwerck, Mohamed Mysara, Natalie Leys, Pieter Monsieurs, et al.. PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing. Bioinformatics, 2020, 36 (8), pp.2337-2344. ⟨10.1093/bioinformatics/btz964⟩. ⟨hal-02961993⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRAE INRAEOCCITANIETOULOUSE LIPME SANTE-PLANTES-ENVIRONNEMENT

46 Consultations

43 Téléchargements

PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager