Skip to Main content Skip to Navigation
Journal articles

Model‐based biclustering for overdispersed count data with application in microbial ecology

Abstract : Different studies have shown that microbial communities living in animals (humans included), in or around plants have a significant impact on health and disease of their host and on various services, such as adaptation under stressing environment. The basic input data to study microbiomes is a matrix representing abundance data of micro-organisms across different sampling units. Such a matrix typically corresponds to taxonomic profiles derived from the high-throughput sequencing of environmental samples. Biclustering is one way to study the interactions between the structure of micro-organism communities and the environmental samples they come from. We propose a latent block model (LBM) and an associated inference procedure for the biclustering of rows and columns of abundance matrices. The LBM assumes that micro-organisms (rows) and environmental samples (columns) can both be clustered into groups characterizing preferential interaction or avoidance. We use the Poisson-Gamma distribution to model the overdispersion observed in microbial abundance data and introduce row and column effects to account for the sequencing effort in each sample and the mean abundance of each micro-organism. Because the latent variables are not independent conditionally on the observed ones, classical maximum likelihood inference is intractable. We then derive a variational-based inference algorithm and propose a strategy to select the number of biclusters. We illustrate the flexibility and performance of our approach both on a simulation study and on three ecological datasets. The model-based framework allows us to adapt to peculiarities of microbial ecological abundance data and allows us to explore relationships between entities of two different natures. We implemented our method in the cobiclust R package available on the CRAN and built a website with example of usage ().
Document type :
Journal articles
Complete list of metadata

https://hal.inrae.fr/hal-03323318
Contributor : Sophie Schbath <>
Submitted on : Friday, August 20, 2021 - 5:31:27 PM
Last modification on : Friday, August 27, 2021 - 3:30:02 AM

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Julie Aubert, Sophie Schbath, Stéphane Robin. Model‐based biclustering for overdispersed count data with application in microbial ecology. Methods in Ecology and Evolution, Wiley, 2021, 12 (6), pp.1050-1061. ⟨10.1111/2041-210X.13582⟩. ⟨hal-03323318⟩

Share

Metrics

Record views

23