High density-focused uncertainty sampling for active learning over evolving stream data - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Accéder directement au contenu
Communication Dans Un Congrès Année : 2014

High density-focused uncertainty sampling for active learning over evolving stream data

Une grande incertitude d'échantillonnage de densité axée pour l'apprentissage actif sur l'évolution des données de flux

Résumé

Data labeling is an expensive and time-consuming task, hence carefully choosing which labels to use for training a model is becoming increasingly important. In the active learning setting, a classifier is trained by querying labels from a small representative fraction of data. While many approaches exist for non-streaming scenarios, few works consider the challenges of the data stream setting. We propose a new active learning method for evolving data streams based on a combination of density and prediction uncertainty (DBALStream). Our approach decides to label an instance or not, considering whether it lies in an high density partition of the data space. This allows focusing labelling efforts in the instance space where more data is concentrated; hence, the benefits of learning a more accurate classifier are expected to be higher. Instance density is approximated in an online manner by a sliding window mechanism, a standard technique for data streams. We compare our method with state-of-the-art active learning strategies over benchmark datasets. The experimental analysis demonstrates good predictive performance of the new approach.
Fichier principal
Vignette du fichier
ienco14.pdf (535.48 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-02600408 , version 1 (22-04-2021)

Identifiants

Citer

Dino Ienco, Bernhard Pfahringer, Indrė Žliobaitė. High density-focused uncertainty sampling for active learning over evolving stream data. BigMine 2014, Aug 2014, New York, United States. pp.133-148. ⟨hal-02600408⟩
31 Consultations
14 Téléchargements

Partager

Gmail Facebook X LinkedIn More