Skip to Main content Skip to Navigation
Journal articles

A progressive sampling framework for clustering

Abstract : Clustering algorithms become more and more sophisticated to cope with large data sets of increasing complexity. Sampling selection methods are likely to provide an interesting alternative as they can reduce memory requirements, and reduce execution time. Many sampling algorithms for clustering are efficient but they each have their own limitations with large data sets. In this paper, we introduce a sampling framework for clustering algorithms that inherits from both progressive sampling and stratification concepts. Driven by two parameters, the iterative process consists in managing representatives of independent strata that carry similar statistical information regarding the clustering objective. At each iteration, the candidate representatives of the incoming stratum are examined. The interesting feature of the framework stems from the idea of selecting new representatives of the incoming stratum only if they improve the representation quality of the already selected set of samples. The algorithm stops when new representatives are no longer needed, which is likely to happen without examining the whole data set. The tests conducted on synthetic and real world datasets proved that the progressive sampling framework yielded similar results to the sampling algorithm applied to the whole set in a low computational time. In comparison with progressive sampling techniques, using the proposed framework enables smaller sampling sets to be used without loss of accuracy.
Document type :
Journal articles
Complete list of metadata
Contributor : Isabelle Nault <>
Submitted on : Wednesday, April 21, 2021 - 11:23:46 AM
Last modification on : Thursday, April 22, 2021 - 3:31:49 AM



Frédéric Ros, Serge Guillaume. A progressive sampling framework for clustering. Neurocomputing, Elsevier, In press, ⟨10.1016/j.neucom.2021.04.029⟩. ⟨hal-03204005⟩



Record views