Large-scale significance testing of high thoroughput Data with FAMT
Résumé
Analysis of complex systems using high throughput technologies offers new challenges for statistics. In systems biology for example, microarray technology gives access to whole genome transcription datasets. It has therefore turned out to be a powerful tool to find out genes which expression variations are significantly related to a given trait using large-scale significance testing. Since Benjamini and Hochberg (1995)'s procedure to control the False Discovery Rate (FDR), the multiple testing theory has been deeply renewed. But the heterogeneity of microarray data has long been ignored in statistical models. However, some recent papers (see Friguet et al., 2009) suggest that unmodeled heterogeneity factors may generate some dependence across gene expressions and affect consistency of the multiple testing results. Friguet et al. (2009) propose a supervised factor model to identify the latent heterogeneity components, method implemented in the R package FAMT (Factor Analysis for Multiple Testing). The talk aims both at presenting the statistical handling of multiple testing dependence as proposed in Friguet et al. (2009) and at illustrating the performance of the method by a microarray data analysis using the R package FAMT. As described in Blum et al. (2010), this microarray study analyses the relationships between the abdominal fatness of chickens and hepatic transcriptome profiles. Some heterogeneity components are extracted from the data by an EM algorithm. Additional functionalities optimize the procedure, such as the estimation of the proportion of true null hypotheses or the optimal number of factors.
Origine | Fichiers produits par l'(les) auteur(s) |
---|