A Semi-Supervised Approach to the Detection and Characterization of Outliers in Categorical Data

Dino Ienco; Ruggero Pensa; Rosa Meo

doi:10.1109/TNNLS.2016.2526063

Article Dans Une Revue IEEE Transactions on Neural Networks and Learning Systems Année : 2017

A Semi-Supervised Approach to the Detection and Characterization of Outliers in Categorical Data

(1, 2) , (3) , (3)

1
2
3

Dino Ienco

Fonction : Auteur
PersonId : 6226
IdHAL : dino-ienco
ORCID : 0000-0002-8736-3132
IdRef : 172688183

ADVanced Analytics for data SciencE

Territoires, Environnement, Télédétection et Information Spatiale

Ruggero Pensa

Fonction : Auteur

Università degli studi di Torino = University of Turin

Rosa Meo

Fonction : Auteur

Università degli studi di Torino = University of Turin

Résumé

In this paper we introduce a new approach of semi-supervised anomaly detection that deals with categorical data. Given a training set of instances (all belonging to the normal class), we analyze the relationships among features for the extraction of a discriminative characterization of the anomalous instances. Our key idea is to build a model characterizing the features of the normal instances and then use a set of distance-based techniques for the discrimination between the normal and the anomalous instances. We compare our approach with the state-of-the-art methods for semi-supervised anomaly detection. We empirically show that a specifically designed technique for the management of the categorical data outperforms the general-purpose approaches. We also show that, in contrast with other approaches that are opaque because their decision cannot be easily understood, our proposal produces a discriminative model that can be easily interpreted and used for the exploration of the data.

Mots clés

Anomaly detection categorical data semi-supervised learning distance learning

Domaines

Apprentissage [cs.LG] Base de données [cs.DB] Recherche d'information [cs.IR]

Fichier principal

tnnls.pdf (537.04 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Dino Ienco : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01275509

Soumis le : mercredi 17 février 2016-15:36:43

Dernière modification le : mardi 12 mars 2024-10:46:29

Archivage à long terme le : mercredi 18 mai 2016-13:08:49

Dates et versions

lirmm-01275509 , version 1 (17-02-2016)

Identifiants

HAL Id : lirmm-01275509 , version 1
DOI : 10.1109/TNNLS.2016.2526063
IRSTEA : PUB00047405

Citer

Dino Ienco, Ruggero Pensa, Rosa Meo. A Semi-Supervised Approach to the Detection and Characterization of Outliers in Categorical Data. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28 (5), pp.1017-1029. ⟨10.1109/TNNLS.2016.2526063⟩. ⟨lirmm-01275509⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CIRAD AGROPARISTECH CNRS IRSTEA ADVANSE LIRMM AGROPOLIS TETIS MIPS UNIV-MONTPELLIER INRAE INRAEOCCITANIEMONTPELLIER MATHNUM

409 Consultations

728 Téléchargements

A Semi-Supervised Approach to the Detection and Characterization of Outliers in Categorical Data

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager