ProTraS: A probabilistic traversing sampling algorithm - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Accéder directement au contenu
Article Dans Une Revue Expert Systems with Applications Année : 2018

ProTraS: A probabilistic traversing sampling algorithm

Résumé

In the process of knowledge discovery in big data, sampling is a technological brick that can be included in a more general framework to speed up existing algorithms and contribute to the scalability issue. Two challenging and connected problems arise with complexity: tuning and timing. ProTraS1 is a new algorithm that fulfills both requirements. It is driven by a unique parameter, the sampling cost. The cost is overestimated by the maximum within group distance and the group cardinality. It is an iterative algorithm, at each step a new representative is added, chosen as the farthest-first traversal item from the representative in the group with the highest probability of cost reduction. The novel algorithm is robust to noise and time optimized. A detailed comparison with alternative algorithms, conducted on various synthetic and real world data sets, shows that the proposal yields competitive results in terms of quality of representation for clustering, sampling size and sampling time.
Fichier principal
Vignette du fichier
pub00057553.pdf (1.61 Mo) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02607456 , version 1 (16-05-2020)

Identifiants

Citer

F. Ros, S. Guillaume. ProTraS: A probabilistic traversing sampling algorithm. Expert Systems with Applications, 2018, 105, pp.65-76. ⟨10.1016/j.eswa.2018.03.052⟩. ⟨hal-02607456⟩
91 Consultations
157 Téléchargements

Altmetric

Partager

Gmail Mastodon Facebook X LinkedIn More