Model‐based biclustering for overdispersed count data with application in microbial ecology - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Access content directly
Journal Articles Methods in Ecology and Evolution Year : 2021

Model‐based biclustering for overdispersed count data with application in microbial ecology

Abstract

Different studies have shown that microbial communities living in animals (humans included), in or around plants have a significant impact on health and disease of their host and on various services, such as adaptation under stressing environment. The basic input data to study microbiomes is a matrix representing abundance data of micro-organisms across different sampling units. Such a matrix typically corresponds to taxonomic profiles derived from the high-throughput sequencing of environmental samples. Biclustering is one way to study the interactions between the structure of micro-organism communities and the environmental samples they come from. We propose a latent block model (LBM) and an associated inference procedure for the biclustering of rows and columns of abundance matrices. The LBM assumes that micro-organisms (rows) and environmental samples (columns) can both be clustered into groups characterizing preferential interaction or avoidance. We use the Poisson-Gamma distribution to model the overdispersion observed in microbial abundance data and introduce row and column effects to account for the sequencing effort in each sample and the mean abundance of each micro-organism. Because the latent variables are not independent conditionally on the observed ones, classical maximum likelihood inference is intractable. We then derive a variational-based inference algorithm and propose a strategy to select the number of biclusters. We illustrate the flexibility and performance of our approach both on a simulation study and on three ecological datasets. The model-based framework allows us to adapt to peculiarities of microbial ecological abundance data and allows us to explore relationships between entities of two different natures. We implemented our method in the cobiclust R package available on the CRAN and built a website with example of usage ().

Dates and versions

hal-03323318 , version 1 (20-08-2021)

Licence

Attribution

Identifiers

Cite

Julie Aubert, Sophie Schbath, Stéphane Robin. Model‐based biclustering for overdispersed count data with application in microbial ecology. Methods in Ecology and Evolution, 2021, 12 (6), pp.1050-1061. ⟨10.1111/2041-210X.13582⟩. ⟨hal-03323318⟩
86 View
0 Download

Altmetric

Share

Gmail Facebook X LinkedIn More