PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement
Article Dans Une Revue Bioinformatics Année : 2020

PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing

Résumé

Motivation: One of the most widespread methods used in taxonomy studies to distinguish between strains or taxa is the calculation of average nucleotide identity. It requires a computationally expensive alignment step and is therefore not suitable for large-scale comparisons. Short oligonucleotide-based methods do offer a faster alternative but at the expense of accuracy. Here, we aim to address this shortcoming by providing a software that implements a novel method based on short-oligonucleotide frequencies to compute inter-genomic distances. Results: Our tetranucleotide and hexanucleotide implementations, which were optimized based on a taxonomically well-defined set of over 200 newly sequenced bacterial genomes, are as accurate as the short oligonucleotide-based method TETRA and average nucleotide identity, for identifying bacterial species and strains, respectively. Moreover, the lightweight nature of this method makes it applicable for large-scale analyses. Availability and implementation: The method introduced here was implemented, together with other existing methods , in a dependency-free software written in C, GenDisCal, available as source code from https://github.com/LM-UGent/GenDisCal. The software supports multithreading and has been tested on Windows and Linux (CentOS). In addition, a Java-based graphical user interface that acts as a wrapper for the software is also available.
Fichier principal
Vignette du fichier
2020_goussarov_bioinformatics.pdf (6.24 Mo) Télécharger le fichier
Origine Fichiers éditeurs autorisés sur une archive ouverte
Loading...

Dates et versions

hal-02961993 , version 1 (14-10-2020)

Licence

Identifiants

Citer

Gleb Goussarov, Ilse Cleenwerck, Mohamed Mysara, Natalie Leys, Pieter Monsieurs, et al.. PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing. Bioinformatics, 2020, 36 (8), pp.2337-2344. ⟨10.1093/bioinformatics/btz964⟩. ⟨hal-02961993⟩
46 Consultations
43 Téléchargements

Altmetric

Partager

More