Theoretical analysis of cross-validation for estimating the risk of the k-nearest neighbor classifier - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Accéder directement au contenu
Article Dans Une Revue Journal of Machine Learning Research Année : 2018

Theoretical analysis of cross-validation for estimating the risk of the k-nearest neighbor classifier

Résumé

The present work aims at deriving theoretical guaranties on the behavior of some cross-validation procedures applied to the k-nearest neighbors (kNN) rule in the context of binary classification. Here we focus on the leave-p-out cross-validation (LpO) used to assess the performance of the kNN classifier. Remarkably this LpO estimator can be efficiently computed in this context using closed-form formulas derived by Celisse and Mary-Huard (2011). We describe a general strategy to derive moment and exponential concentration inequalities for the LpO estimator applied to the kNN classifier. Such results are obtained first by exploiting the connection between the LpO estimator and U-statistics, and second by making an intensive use of the generalized Efron-Stein inequality applied to the L1O estimator. One other important contribution is made by deriving new quantifications of the discrepancy between the LpO estimator and the classification error/risk of the kNN classifier. The optimality of these bounds is discussed by means of several lower bounds as well as simulation experiments.
Fichier principal
Vignette du fichier
2018_Celisse_Journal of Machine Learning Researchpdf_1 (573.65 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-02621332 , version 1 (26-05-2020)

Licence

Paternité

Identifiants

  • HAL Id : hal-02621332 , version 1
  • PRODINRA : 472085
  • WOS : 000452043700001

Citer

Alain Celisse, Tristan Mary-Huard. Theoretical analysis of cross-validation for estimating the risk of the k-nearest neighbor classifier. Journal of Machine Learning Research, 2018, 19, pp.1-54. ⟨hal-02621332⟩
113 Consultations
28 Téléchargements

Partager

Gmail Facebook X LinkedIn More