Detection of natural clusters via S-DBSCAN a Self-tuning version of DBSCAN
Résumé
Density-based clustering algorithms have made a large impact on a wide range of application fields application. As more data are available with rising size and various internal organizations, non parametric unsupervised procedures are becoming ever more important in understanding datasets. In this paper a new clustering algorithm S-DBSCAN(1) is proposed in the context of knowledge discovery. S-DBSCAN belongs to the connectivity-based family such as DBSCAN but with noticeable differences and advantages as working in a differential mode. It is formalized via a very simple hierarchical process that hybridizes distance, k-nearest and Density peaks concepts. It aims at partitioning existing data into clusters until no more clustering can be done. The information delivered allows the user to intuitively deduce different sets of natural partitions in clusters at different scales. S-DBSCAN scans the database in a ordered way by applying its algorithm core (S-DBSCANCORE) with judicious input parameters. Given a set of data patterns in some space, S-DBSCANCORE groups together data patterns that are closely packed together with respect to the differential density. Data patterns whose nearest neighbors have too different densities are detected and marked as borders while the others are not visited. S-DBSCAN embeds some intelligence that makes it self-tuning (almost fully automatic) and not dependent on a global density threshold as many existing algorithms. Tests were carried out using 2-dimensional benchmark datasets of various shapes and densities. They showed that S-DBSCAN was highly effective. It also proved efficient in high dimension space when natural clusters exist and much easier to use than competitive algorithms.
Domaines
Sciences du Vivant [q-bio]Origine | Fichiers produits par l'(les) auteur(s) |
---|