Skip to Main content Skip to Navigation
Journal articles

NGS-Indel Coder: A pipeline to code indel characters in phylogenomic data with an example of its application in milkweeds (Asclepias)

Abstract : Targeted genome sequencing approaches allow characterization of evolutionary relationships using a considerable number of nuclear genes and informative characters. However, most phylogenomic analyses only utilize single nucleotide polymorphisms (SNPs). Studies at the species level, especially in groups that have recently radiated, often recover low amounts of phylogenetically informative variation in coding regions, and require non-coding sequences, which are richer in indels, to resolve gene trees. Here, NGS-Indel Coder, a pipeline to detect and omit false positive indels inferred from assemblies of short read sequence data, was developed to resolve the relationships among and within major clades of the American milkweeds (Asclepias), which are the result of a rapid and recent evolutionary radiation, and whose phylogeny has been difficult to resolve. This pipeline was applied to a Hyb-Seq data set of 768 loci including targeted exons and flanking intron regions from 33 milkweed species. Robust species tree inference was improved by excluding small alignment partitions (<100 bp) that increased gene tree ambiguity and incongruence. To further investigate the robustness of indel coding, data sets that included small and large indels were explored, and species trees derived from concatenated loci versus coalescent methods based on gene trees were compared. The phylogeny of Asclepias obtained using nuclear data was well resolved, and phylogenetic information from indels improved resolution of specific nodes. The Temperate North American, Mexican Highland, and Incarnatae clades were well supported as monophyletic. Asclepias coulteri, which has been considered part of the Sonoran Desert clade based on plastome analyses, was placed as sister to all the other milkweed species studied here, rather than as a member of that clade. Two groups within the Temperate North American and Mexican clades were not resolved, and the inferred relationships strongly conflicted when comparing results based on data sets that did or did not include indel characters. This new pipeline represents a step forward in making maximal use of the information content in phylogenomic data sets.
Document type :
Journal articles
Complete list of metadata

https://hal.inrae.fr/hal-03100083
Contributor : Anne-Sophie Grenier <>
Submitted on : Wednesday, January 6, 2021 - 2:38:57 PM
Last modification on : Thursday, May 13, 2021 - 8:24:02 PM

Identifiers

Citation

Julien Boutte, Mark Fishbein, Aaron Liston, Shannon C.K. Straub. NGS-Indel Coder: A pipeline to code indel characters in phylogenomic data with an example of its application in milkweeds (Asclepias). Molecular Phylogenetics and Evolution, Elsevier, 2019, 139, pp.106534. ⟨10.1016/j.ympev.2019.106534⟩. ⟨hal-03100083⟩

Share

Metrics

Record views

10