Learning from Biased Data: A Semi-Parametric Approach - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Learning from Biased Data: A Semi-Parametric Approach

Résumé

We consider risk minimization problems where the (source) distribution P-S of the training obser- vations Z(1),..., Z(n) differs from the (target) distribution P-T involved in the risk that one seeks to minimize Under the natural assumption that P-S dominates P-T , i.e. PT << PS, we develop a semiparametric framework in the situation where we do not observe any sample from P-T, but rather have access to some auxiliary information at the target population scale. More precisely, assuming that the Radon-Nikodym derivative dP(T)/dP(S)(z) belongs to a parametric class {g(z, alpha), alpha is an element of A} and that some (generalized) moments of P-T are available to the learner, we propose a two-step learning procedure to perform the risk minimization task. We first select (alpha) over cap so as to match the moment constraints as closely as possible and then reweight each (biased) training observation Z(i) by g(Z(i), (alpha) over cap) in the final Empirical Risk Minimization (ERM) algorithm. We establish a O-P(1/ root n) generalization bound proving that, remarkably, the solution to the weighted ERM problem thus constructed achieves a learning rate of the same order as that attained in absence of any sampling bias. Beyond these theoretical guarantees, numerical results providing strong empirical evidence of the relevance of the approach promoted in this article are displayed.
Fichier non déposé

Dates et versions

hal-04431531 , version 1 (01-02-2024)

Identifiants

  • HAL Id : hal-04431531 , version 1
  • WOS : 00683104600074

Citer

Patrice Bertail, Stéphan Clémençon, Yannick Guyonvarch, Nathan Noiry. Learning from Biased Data: A Semi-Parametric Approach. ICML 2021 virtual conference - 38th International Conference on Machine Learning, International Conference on Machine Learning, Jul 2021, En ligne (Etats-Unis), United States. pp.803-812. ⟨hal-04431531⟩
61 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More