A progressive sampling framework for clustering

Frédéric Ros; Serge Guillaume

doi:10.1016/j.neucom.2021.04.029

Article Dans Une Revue Neurocomputing Année : 2021

A progressive sampling framework for clustering

(1) , (2)

1
2

Frédéric Ros

Fonction : Auteur
PersonId : 768125
ORCID : 0000-0001-9954-8399

Laboratoire pluridisciplinaire de recherche en ingénierie des systèmes, mécanique et énergétique

Serge Guillaume

Fonction : Auteur
PersonId : 837856

Information – Technologies – Analyse Environnementale – Procédés Agricoles

Résumé

Clustering algorithms become more and more sophisticated to cope with large data sets of increasing complexity. Sampling selection methods are likely to provide an interesting alternative as they can reduce memory requirements, and reduce execution time. Many sampling algorithms for clustering are efficient but they each have their own limitations with large data sets. In this paper, we introduce a sampling framework for clustering algorithms that inherits from both progressive sampling and stratification concepts. Driven by two parameters, the iterative process consists in managing representatives of independent strata that carry similar statistical information regarding the clustering objective. At each iteration, the candidate representatives of the incoming stratum are examined. The interesting feature of the framework stems from the idea of selecting new representatives of the incoming stratum only if they improve the representation quality of the already selected set of samples. The algorithm stops when new representatives are no longer needed, which is likely to happen without examining the whole data set. The tests conducted on synthetic and real world datasets proved that the progressive sampling framework yielded similar results to the sampling algorithm applied to the whole set in a low computational time. In comparison with progressive sampling techniques, using the proposed framework enables smaller sampling sets to be used without loss of accuracy.

Mots clés

stratification progressive selection clustering nearest neighbor

Domaines

Informatique [cs]

Fichier principal

S0925231221005567.pdf (809.19 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Accord Elsevier CCSD : Connectez-vous pour contacter le contributeur

https://hal.inrae.fr/hal-03204005

Soumis le : mardi 9 mai 2023-09:10:47

Dernière modification le : mardi 12 mars 2024-10:44:47

Archivage à long terme le : jeudi 10 août 2023-18:38:00

Dates et versions

hal-03204005 , version 1 (09-05-2023)

Licence

Paternité - Pas d'utilisation commerciale

Identifiants

HAL Id : hal-03204005 , version 1
DOI : 10.1016/j.neucom.2021.04.029
PII : S0925-2312(21)00556-7
WOS : 000660410800005

Citer

Frédéric Ros, Serge Guillaume. A progressive sampling framework for clustering. Neurocomputing, 2021, 450, pp.48-60. ⟨10.1016/j.neucom.2021.04.029⟩. ⟨hal-03204005⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-ORLEANS PRISME-CVL INSA-GROUPE ITAP INSA-CVL INSTITUT-AGRO-MONTPELLIER INRAE INRAEOCCITANIEMONTPELLIER MATHNUM

78 Consultations

31 Téléchargements

A progressive sampling framework for clustering

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager