Skip to Main content Skip to Navigation
Journal articles

PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing

Abstract : Motivation: One of the most widespread methods used in taxonomy studies to distinguish between strains or taxa is the calculation of average nucleotide identity. It requires a computationally expensive alignment step and is therefore not suitable for large-scale comparisons. Short oligonucleotide-based methods do offer a faster alternative but at the expense of accuracy. Here, we aim to address this shortcoming by providing a software that implements a novel method based on short-oligonucleotide frequencies to compute inter-genomic distances. Results: Our tetranucleotide and hexanucleotide implementations, which were optimized based on a taxonomically well-defined set of over 200 newly sequenced bacterial genomes, are as accurate as the short oligonucleotide-based method TETRA and average nucleotide identity, for identifying bacterial species and strains, respectively. Moreover, the lightweight nature of this method makes it applicable for large-scale analyses. Availability and implementation: The method introduced here was implemented, together with other existing methods , in a dependency-free software written in C, GenDisCal, available as source code from https://github.com/LM-UGent/GenDisCal. The software supports multithreading and has been tested on Windows and Linux (CentOS). In addition, a Java-based graphical user interface that acts as a wrapper for the software is also available.
Document type :
Journal articles
Complete list of metadata

Cited literature [41 references]  Display  Hide  Download

https://hal.inrae.fr/hal-02961993
Contributor : Aurélien Carlier <>
Submitted on : Wednesday, October 14, 2020 - 10:53:47 AM
Last modification on : Wednesday, May 12, 2021 - 8:10:08 AM

File

2020_goussarov_bioinformatics....
Publisher files allowed on an open archive

Licence


Distributed under a Creative Commons Attribution - NonCommercial 4.0 International License

Identifiers

Collections

Citation

Gleb Goussarov, Ilse Cleenwerck, Mohamed Mysara, Natalie Leys, Pieter Monsieurs, et al.. PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing. Bioinformatics, Oxford University Press (OUP), 2020, 36 (8), pp.2337-2344. ⟨10.1093/bioinformatics/btz964⟩. ⟨hal-02961993⟩

Share

Metrics

Record views

29

Files downloads

40