Article Dans Une Revue Computational and Structural Biotechnology Journal Année : 2026

A cooperative learning framework for the integration of metabolomic data from multiple cohorts and common phenotype identification

Résumé

Integrating metabolomic data from multiple studies/cohorts could be an efficient strategy to enhance statistical power and identify robust biomarkers. However, challenges associated with batch effects, study-specific biases, and dataset heterogeneity, hinder results reproducibility and translation. Limitations have been reported in shared variable mode data integration approaches, both for early fusion that struggles with inter-study variability, and late fusion that may overlook inter-dataset relationships. Here, we propose a novel cooperative learning framework for metabolomics data integration from multiple studies, designed to improve candidate biomarker discovery by balancing advantages of early and late fusion, while mitigating study-specific confounders. The proposed approach consists in leveraging univariate and multivariate analysis and an optimized loss function. To implement the approach, early-stage integration was based on a multiblock method (MINT-PLS-DA), while separate PLS-DA was used in late fusion. Univariate analysis was performed via a mixed model. The approach was first evaluated in controlled conditions using synthetic data, and then applied to two existing untargeted metabolomics human datasets. Preliminary assessment focused on batch effect reduction across datasets, and agreement between early and late fusion outputs. Using real word data, the results showed that 10% of the initial features were stable across early and late fusion. This showed improved consistency compared to when they were published separately on the integrated dataset. All results demonstrate the ability of the proposed approach to capture the common part of phenotypes. The developed integration model based on cooperative learning leverages the complementary strengths of early and late fusion, offering an efficient solution for metabolomics data integration, enhancing the reliability of potential biomarker discovery.

Dates et versions

hal-05453321 , version 1 (12-01-2026)

Identifiants

Citer

E. Salanon, E. Jules, B. Comte, J. Boccard, Estelle Pujos-Guillot. A cooperative learning framework for the integration of metabolomic data from multiple cohorts and common phenotype identification. Computational and Structural Biotechnology Journal, 2026, 31, pp.346-354. ⟨10.1016/j.csbj.2025.12.020⟩. ⟨hal-05453321⟩
34 Consultations
0 Téléchargements

Altmetric

Partager

  • More