A new density-based sampling algorithm
Un nouvel algorithme d'échantillonnage basé sur la densité
Résumé
To face the big data challenge, sampling can be used as a preprocessing step for clustering. In this paper, an hybrid algorithm is proposed. It is density-based while managing distance concepts. The algorithm behavior is investigated using synthetic and real-world data sets. The first experiments proved it can be accurate, according to the Rand Index, with both \textit{k-means} and \textit{hierarchical} clustering algorithms.