Detection of natural clusters via S-DBSCAN a Self-tuning version of DBSCAN - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Accéder directement au contenu
Article Dans Une Revue Knowledge-Based Systems Année : 2022

Detection of natural clusters via S-DBSCAN a Self-tuning version of DBSCAN

Résumé

Density-based clustering algorithms have made a large impact on a wide range of application fields application. As more data are available with rising size and various internal organizations, non parametric unsupervised procedures are becoming ever more important in understanding datasets. In this paper a new clustering algorithm S-DBSCAN(1) is proposed in the context of knowledge discovery. S-DBSCAN belongs to the connectivity-based family such as DBSCAN but with noticeable differences and advantages as working in a differential mode. It is formalized via a very simple hierarchical process that hybridizes distance, k-nearest and Density peaks concepts. It aims at partitioning existing data into clusters until no more clustering can be done. The information delivered allows the user to intuitively deduce different sets of natural partitions in clusters at different scales. S-DBSCAN scans the database in a ordered way by applying its algorithm core (S-DBSCANCORE) with judicious input parameters. Given a set of data patterns in some space, S-DBSCANCORE groups together data patterns that are closely packed together with respect to the differential density. Data patterns whose nearest neighbors have too different densities are detected and marked as borders while the others are not visited. S-DBSCAN embeds some intelligence that makes it self-tuning (almost fully automatic) and not dependent on a global density threshold as many existing algorithms. Tests were carried out using 2-dimensional benchmark datasets of various shapes and densities. They showed that S-DBSCAN was highly effective. It also proved efficient in high dimension space when natural clusters exist and much easier to use than competitive algorithms.

Dates et versions

hal-03689167 , version 1 (07-06-2022)

Identifiants

Citer

Frédéric Ros, Serge Guillaume, Rabia Riad, Mohamed El Hajji. Detection of natural clusters via S-DBSCAN a Self-tuning version of DBSCAN. Knowledge-Based Systems, 2022, 241, pp.108288. ⟨10.1016/j.knosys.2022.108288⟩. ⟨hal-03689167⟩
21 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More