A progressive sampling framework for clustering - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Access content directly
Journal Articles Neurocomputing Year : 2021

A progressive sampling framework for clustering

Abstract

Clustering algorithms become more and more sophisticated to cope with large data sets of increasing complexity. Sampling selection methods are likely to provide an interesting alternative as they can reduce memory requirements, and reduce execution time. Many sampling algorithms for clustering are efficient but they each have their own limitations with large data sets. In this paper, we introduce a sampling framework for clustering algorithms that inherits from both progressive sampling and stratification concepts. Driven by two parameters, the iterative process consists in managing representatives of independent strata that carry similar statistical information regarding the clustering objective. At each iteration, the candidate representatives of the incoming stratum are examined. The interesting feature of the framework stems from the idea of selecting new representatives of the incoming stratum only if they improve the representation quality of the already selected set of samples. The algorithm stops when new representatives are no longer needed, which is likely to happen without examining the whole data set. The tests conducted on synthetic and real world datasets proved that the progressive sampling framework yielded similar results to the sampling algorithm applied to the whole set in a low computational time. In comparison with progressive sampling techniques, using the proposed framework enables smaller sampling sets to be used without loss of accuracy.
Fichier principal
Vignette du fichier
S0925231221005567.pdf (809.19 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-03204005 , version 1 (09-05-2023)

Licence

Attribution - NonCommercial

Identifiers

Cite

Frédéric Ros, Serge Guillaume. A progressive sampling framework for clustering. Neurocomputing, 2021, 450, pp.48-60. ⟨10.1016/j.neucom.2021.04.029⟩. ⟨hal-03204005⟩
57 View
1 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More