New efficient algorithms for multiple change-point detection with reproducing kernels

Several statistical approaches based on reproducing kernels have been proposed to detect abrupt changes arising in the full distribution of the observations and not only in the mean or variance. Some of these approaches enjoy good statistical properties (oracle inequality, consistency). Nonetheless, they have a high computational cost both in terms of time and memory. This makes their application difficult even for small and medium sample sizes (n < 10(4)). This computational issue is addressed by first describing a new efficient procedure for kernel multiple change-point detection with an improved worst-case complexity that is quadratic in time and linear in space. It is based on an exact optimization algorithm and deals with medium size signals (up to n approximate to 10(5)). Second, a faster procedure (based on an approximate optimization algorithm) is described. It relies on a low-rank approximation to the Gram matrix and is linear in time and space. The resulting procedure can be applied to large-scale signals (n >= 10(6)). These two procedures (based on the exact or approximate optimization algorithms) have been implemented in R. and C for various kernels. The computational and statistical performances of these new algorithms have been assessed through empirical experiments. The runtime of the new algorithms is observed to be faster than that of other considered procedures. Finally, simulations confirmed the higher statistical accuracy of kernel-based approaches to detect changes that are not only in the mean. These simulations also illustrate the flexibility of kernel-based approaches to analyze complex biological profiles made of DNA copy number and allele B frequencies.

Mots clés

Kernel method Nonparametric change-point detection Model selection Algorithms Dynamic programming Allele B fraction Gram matrix DNA copy number

Domaines

Mathématiques [math] Informatique [cs] Sciences du Vivant [q-bio] Biologie végétale

Migration ProdInra : Connectez-vous pour contacter le contributeur

https://hal.inrae.fr/hal-02621669

Soumis le : mardi 26 mai 2020-03:12:03

Dernière modification le : vendredi 6 septembre 2024-03:17:09

Dates et versions

hal-02621669 , version 1 (26-05-2020)

Identifiants

HAL Id : hal-02621669 , version 1
DOI : 10.1016/j.csda.2018.07.002
PRODINRA : 459243
WOS : 000445989700014

Citer

Alain Celisse, Guillemette Marot, Pierre-Jean Male, Guillem Rigaill. New efficient algorithms for multiple change-point detection with reproducing kernels. Computational Statistics and Data Analysis, 2018, 128, pp.200-220. ⟨10.1016/j.csda.2018.07.002⟩. ⟨hal-02621669⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-PARIS7 CNRS UNIV-EVRY INRA INSMI USPC LAMME IPS2 UNIV-PARIS-SACLAY UNIV-LILLE INRAE UP-SCIENCES ANR GS-ENGINEERING MATHNUM LPP-MATH BIOLOGIE_ET_AMELIORATION_DES_PLANTES

115 Consultations

0 Téléchargements