Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement
Poster De Conférence Année : 2019

Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics

Résumé

We focus on a modification of the classical hierarchical agglomerative clustering (HAC), where only adjacent clusters (according to the ordering of positions within a chromosome) can be merged. Adjacency-constrained HAC is implemented in the R package rioja. Our main contribution with respect to existing works is an efficient implementation of adjacency-constrained HAC in the case where the similarity between genomically distant objects can be considered as negligible. We propose an algorithm that is almost linear in time and space with respect to the number of objects to be clustered. It uses a sparse band strategy based on pre-computations of certain cumulative sums of similarities, combined with a min-heap approach to efficiently store and maintain a list of candidate merges. This algorithm is implemented in the R package adjclust, which is available at https://CRAN.R-project.org/package=adjclust. We provide applications to SNP and Hi-C datasets.
Fichier principal
Vignette du fichier
ambroise_etal_SMPGD2019-poster_1.pdf (436.48 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-02790995 , version 1 (05-06-2020)

Licence

Identifiants

  • HAL Id : hal-02790995 , version 1
  • PRODINRA : 466078

Citer

Christophe Ambroise, Alia Dehman, Pierre Neuvial, Guillem Rigaill, Nathalie Vialaneix. Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics. Statistical Methods for Post Genomic Data (SMPGD 2019), Jan 2019, Barcelona, Spain. 2019. ⟨hal-02790995⟩
52 Consultations
39 Téléchargements

Partager

More