The impact of OTU sequence similarity threshold on diatom-based bioassessment: A case study of the rivers of Mayotte (France, Indian Ocean)

Abstract : Extensive studies on the taxonomic resolution required for bioassessment purposes have determined that resolution above species level (genus, family) is sufficient for their use as indicators of relevant environmental pressures. The high‐throughput se - quencing (HTS) and meta‐barcoding methods now used for bioassessment tradition - ally employ an arbitrary sequence similarity threshold (SST) around 95% or 97% to cluster sequences into operational taxonomic units, which is considered descriptive of species‐level resolution. In this study, we analyzed the effect of the SST on the resulting diatom‐based ecological quality index, which is based on OTU abundance distribution along a defined environmental gradient, ideally avoiding taxonomic as - signments that could result in high rates of unclassified OTUs and biased final values. A total of 90 biofilm samples were collected in 2014 and 2015 from 51 stream sites on Mayotte Island in parallel with measures of relevant physical and chemical param - eters. HTS sequencing was performed on the biofilms using the rbcL region as the genetic marker and diatom‐specific primers. Hierarchical clustering was used to group sequences into OTUs using 20 experimental SST levels (80%–99%). An OTU‐ based quality index (Idx OTU ) was developed based on a weighted average equation using the abundance profiles of the OTUs. The developed Idx OTU revealed significant correlations between the Idx OTU values and the reference pressure gradient, which reached maximal performance using an SST of 90% (well above species level delimi - tation). We observed an interesting and important trade‐off with the power to dis - criminate between sampling sites and index stability that will greatly inform future applications of the index. Taken together, the results from this study detail a thor - oughly optimized and validated approach to generating robust, reproducible, and complete indexes that will greatly facilitate effective and efficient environmental monitoring
