Generalized Functional Pruning Optimal Partitioning (GFPOP) for Constrained Changepoint Detection in Genomic Data - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Accéder directement au contenu
Article Dans Une Revue Journal of Statistical Software Année : 2022

Generalized Functional Pruning Optimal Partitioning (GFPOP) for Constrained Changepoint Detection in Genomic Data

Résumé

We describe a new algorithm and R package for peak detection in genomic data sets using constrained changepoint models. These detect changes from background to peak regions by imposing the constraint that the mean should alternately increase then de- crease. An existing algorithm for this problem exists, and gives state-of-the-art accuracy results, but it is computationally expensive when the number of changes is large. We propose a dynamic programming algorithm that jointly estimates the number of peaks and their locations by minimizing a cost function which consists of a data fitting term and a penalty for each changepoint. Empirically this algorithm has a cost that is O(N log(N )) for analyzing data of length N . We also propose a sequential search algorithm that finds the best solution with K segments in O(log(K)N log(N )) time, which is much faster than the previous O(KN log(N )) algorithm. We show that our disk-based implementation in the PeakSegDisk R package can be used to quickly compute constrained optimal models with many changepoints, which are needed to analyze typical genomic data sets that have tens of millions of observations.
Fichier principal
Vignette du fichier
v101i10.pdf";filename*=UTF-8''v101i10.pdf (1.06 Mo) Télécharger le fichier
Origine : Publication financée par une institution
Licence : CC BY - Paternité

Dates et versions

hal-04191295 , version 1 (29-01-2024)

Licence

Paternité

Identifiants

Citer

Toby Dylan Hocking, Guillem Rigaill, Paul Fearnhead, Guillaume Bourque. Generalized Functional Pruning Optimal Partitioning (GFPOP) for Constrained Changepoint Detection in Genomic Data. Journal of Statistical Software, 2022, 101 (10), ⟨10.18637/jss.v101.i10⟩. ⟨hal-04191295⟩
17 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More