A semi-parametric Gaussian copula model for heterogeneous network inference: an application to multi-omics data - Département GA INRAE
Pré-Publication, Document De Travail (Preprint/Prepublication) Année : 2024

A semi-parametric Gaussian copula model for heterogeneous network inference: an application to multi-omics data

Résumé

Large-scale heterogeneous data integration for network inference is a key methodological challenge, especially in the context of multi-omics data analysis. We propose here a novel procedure based on Gaussian copula methods which allows the joint analysis of data of various types (continuous and discrete). The proposed estimation procedure is semi-parametric, and does not require any explicit assumption concerning the distribution of the marginals. This offers great flexibility for the analysis of biological data that may not follow perfectly any pre-specified parametric distribution. We present a detailed proof of the pairwise likelihood calculation in the context of mixed type data. We show the equivalence between the presence of a block-wise diagonal structure in the copula correlation matrix and block-wise mutual independence in the observed data. We characterize the lower and upper extreme values of the copula parameter in terms of the observed data when a Bernoulli distribution is involved. In an extensive simulation study, we showed that the proposed estimation procedure, based on a pairwise-likelihood approach, was able to accurately estimate the copula correlation matrix, even for a large number of variables (several hundreds) and a small number of replicates (several dozens). The proposed method was also applied to a real ICGC dataset on breast cancer, and is implemented in a freely available R package heterocop.
Fichier principal
Vignette du fichier
article_hal.pdf (3.02 Mo) Télécharger le fichier
data.csv (659.1 Ko) Télécharger le fichier
supplementary_hal.pdf (368.34 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04847648 , version 1 (19-12-2024)

Identifiants

  • HAL Id : hal-04847648 , version 1

Citer

Ekaterina Tomilina, Gildas Mazo, Florence Jaffrézic. A semi-parametric Gaussian copula model for heterogeneous network inference: an application to multi-omics data. 2024. ⟨hal-04847648⟩
0 Consultations
0 Téléchargements

Partager

More