Copula-based models for multi-omic network inference
Abstract
Large-scale heterogeneous data integration for network inference is a key methodological challenge, especially in the context of multi-omic data analysis. We propose here a novel procedure based on the copula theory which allows the joint analysis of data of various types (continuous, discrete, etc.) The proposed estimation procedure is semi-parametric, and therefore does not require any explicit assumption concerning the marginal distributions of the data, which offers great flexibility for the analysis of biological data which may not exactly follow any pre-specified parametric distribution. We also present a theoretical proof showing the equivalence between block-wise independence in the copula correlation matrix and in the actual data correlation structure. In an extensive simulation study, we showed that the proposed estimation procedure, based on a pairwise-pseudo-likelihood approach, was able to accurately estimate the copula correlation matrix, even for a quite large number of variables (several hundreds) and a quite small number of replicates (several dozens). The proposed method was also applied to a real ICGC dataset on breast cancer.