Optimizing data integration improves Gene Regulatory Network inference in Arabidopsis thaliana - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Access content directly
Journal Articles Bioinformatics Year : 2024

Optimizing data integration improves Gene Regulatory Network inference in Arabidopsis thaliana

Abstract

Motivations Gene regulatory networks (GRNs) are traditionally inferred from gene expression profiles monitoring a specific condition or treatment. In the last decade, integrative strategies have successfully emerged to guide GRN inference from gene expression with complementary prior data. However, datasets used as prior information and validation gold standards are often related and limited to a subset of genes. This lack of complete and independent evaluation calls for new criteria to robustly estimate the optimal intensity of prior data integration in the inference process. Results We address this issue for two regression-based GRN inference models, a weighted random forest (weigthedRF) and a generalized linear model estimated under a weighted LASSO penalty with stability selection (weightedLASSO). These approaches are applied to data from the root response to nitrate induction in Arabidopsis thaliana. For each gene, we measure how the integration of transcription factor binding motifs influences model prediction. We propose a new approach, DIOgene, that uses model prediction error and a simulated null hypothesis in order to optimize data integration strength in a hypothesis-driven, gene-specific manner. This integration scheme reveals a strong diversity of optimal integration intensities between genes, and offers good performance in minimizing prediction error as well as retrieving experimental interactions. Experimental results show that DIOgene compares favorably against state-of-the-art approaches and allows to recover master regulators of nitrate induction. Availability and implementation The R code and notebooks demonstrating the use of the proposed approaches are available in the repository https://github.com/OceaneCsn/integrative_GRN_N_induction
Fichier principal
Vignette du fichier
CassanO.-et al-postprint-Bioinformatics-2024.pdf (2.55 Mo) Télécharger le fichier
CassanO.-et al-Bioinformatics-2024.pdf (2.54 Mo) Télécharger le fichier
Origin Publisher files allowed on an open archive

Dates and versions

hal-04625944 , version 1 (26-06-2024)

Identifiers

Cite

Oceane Cassan, Charles-Henri Lecellier, Antoine Martin, Laurent Bréhélin, Sophie Lèbre. Optimizing data integration improves Gene Regulatory Network inference in Arabidopsis thaliana. Bioinformatics, 2024, 40 (7), pp.btae415. ⟨10.1093/bioinformatics/btae415⟩. ⟨hal-04625944⟩
25 View
0 Download

Altmetric

Share

Gmail Mastodon Facebook X LinkedIn More