Machine Learning classification performance on mechanistic representations of the gut microbiota built from abundance profiles
Résumé
Understanding the gut microbiota and its mechanisms has become a major point of interest in the medical field, with more and more studies correlating it to a variety of pathologies [1]. Machine Learning methods have been applied to this issue, approaching the microbiome as a predictor of the subjects’ health [2-5]. These approaches however have yet to tap into the potential augmentation of the microbiome data which could be achieved by gathering information correlated to the microbiota’s composition. In particular, the recognised micro-organisms’ functional annotations have been suggested as a promising lead to enhance the comprehension of the microbiota as a metabolic network [6,7]. In line with this approach, we propose a new method to shift the representation of the gut microbiota from relative OTU abundances to a numeric mapping of the associated functional annotations, creating a mechanistic description of the microbial community. We have then explored the performances of Random Forest classifiers, a classic Machine Learning approach for microbiota classification, when applied to data converted to this new paradigm. This led to us finding that for a small sacrifice in classification performance, this approach could help highlight important metabolic mechanisms. Exploiting this method would also yield more thorough and complete results than what can be gathered through the standard approach based on finding OTUs that make a difference between classes of subjects.