ProTraS: A probabilistic traversing sampling algorithm

F. Ros; S. Guillaume

doi:10.1016/j.eswa.2018.03.052

Article Dans Une Revue Expert Systems with Applications Année : 2018

ProTraS: A probabilistic traversing sampling algorithm

(1) , (2)

1
2

F. Ros

Fonction : Auteur

ORLEANS UNIVERSITY FRA

S. Guillaume

Fonction : Auteur
PersonId : 1148420
ORCID : 0000-0002-6769-9982
IdRef : 111748119

Information – Technologies – Analyse Environnementale – Procédés Agricoles

Résumé

In the process of knowledge discovery in big data, sampling is a technological brick that can be included in a more general framework to speed up existing algorithms and contribute to the scalability issue. Two challenging and connected problems arise with complexity: tuning and timing. ProTraS1 is a new algorithm that fulfills both requirements. It is driven by a unique parameter, the sampling cost. The cost is overestimated by the maximum within group distance and the group cardinality. It is an iterative algorithm, at each step a new representative is added, chosen as the farthest-first traversal item from the representative in the group with the highest probability of cost reduction. The novel algorithm is robust to noise and time optimized. A detailed comparison with alternative algorithms, conducted on various synthetic and real world data sets, shows that the proposal yields competitive results in terms of quality of representation for clustering, sampling size and sampling time.

Mots clés

BIG DATA DATA MINING

CLUSTERING DISTANCE COST REDUCTION DENSITY (SPECIFIC GRAVITY) ITERATIVE METHODS ALTERNATIVE ALGORITHMS ITERATIVE ALGORITHM NOVEL ALGORITHM SAMPLING ALGORITHM SAMPLING TIME SCALABILITY ISSUE CLUSTERING ALGORITHMS

DISTORTION COST

Domaines

Sciences de l'environnement

Fichier principal

pub00057553.pdf (1.61 Mo)

Origine	Fichiers produits par l'(les) auteur(s)

Migration Irstea Publications : Connectez-vous pour contacter le contributeur

https://hal.inrae.fr/hal-02607456

Soumis le : samedi 16 mai 2020-14:16:41

Dernière modification le : mardi 25 juin 2024-14:08:54

Dates et versions

hal-02607456 , version 1 (16-05-2020)

Identifiants

HAL Id : hal-02607456 , version 1
DOI : 10.1016/j.eswa.2018.03.052
IRSTEA : PUB00057553

Citer

F. Ros, S. Guillaume. ProTraS: A probabilistic traversing sampling algorithm. Expert Systems with Applications, 2018, 105, pp.65-76. ⟨10.1016/j.eswa.2018.03.052⟩. ⟨hal-02607456⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IRSTEA AGROPOLIS ITAP INSTITUT-AGRO-MONTPELLIER INRAE INRAEOCCITANIEMONTPELLIER MATHNUM

91 Consultations

157 Téléchargements

ProTraS: A probabilistic traversing sampling algorithm

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager