MetaCURL: Non-stationary Concave Utility Reinforcement Learning

Bianca Marin Moreno; Margaux Brégère; Pierre Gaillard; Nadia Oudjane

Pré-Publication, Document De Travail Année : 2024

MetaCURL: Non-stationary Concave Utility Reinforcement Learning

(1, 2, 3) , (4, 2) , (1) , (2, 3)

1
2
3
4

Bianca Marin Moreno

Fonction : Auteur

Apprentissage de modèles à partir de données massives

EDF R&D

Laboratoire de Finance des Marchés d'Energie

Margaux Brégère

Fonction : Auteur

Laboratoire de Probabilités, Statistique et Modélisation

EDF R&D

Pierre Gaillard

Fonction : Auteur
PersonId : 13025
IdHAL : pierre-gaillard
ORCID : 0000-0002-5665-7904
IdRef : 19041992X

Apprentissage de modèles à partir de données massives

Nadia Oudjane

Fonction : Auteur
PersonId : 832229

EDF R&D

Laboratoire de Finance des Marchés d'Energie

Résumé

We explore online learning in episodic loop-free Markov decision processes on non-stationary environments (changing losses and probability transitions). Our focus is on the Concave Utility Reinforcement Learning problem (CURL), an extension of classical RL for handling convex performance criteria in state-action distributions induced by agent policies. While various machine learning problems can be written as CURL, its non-linearity invalidates traditional Bellman equations. Despite recent solutions to classical CURL, none address non-stationary MDPs. This paper introduces MetaCURL, the first CURL algorithm for non-stationary MDPs. It employs a meta-algorithm running multiple black-box algorithms instances over different intervals, aggregating outputs via a sleeping expert framework. The key hurdle is partial information due to MDP uncertainty. Under partial information on the probability transitions (uncertainty and non-stationarity coming only from external noise, independent of agent state-action pairs), we achieve optimal dynamic regret without prior knowledge of MDP changes. Unlike approaches for RL, MetaCURL handles full adversarial losses, not just stochastic ones. We believe our approach for managing non-stationarity with experts can be of interest to the RL community.

Mots clés

Non-stationary Markov process Convex reinforcement learning Online learning Learning with experts advice

Domaines

Machine Learning [stat.ML] Apprentissage [cs.LG] Probabilités [math.PR] Statistiques [math.ST]

Fichier principal

paper.pdf (367.92 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Bianca Marin Moreno : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04591366

Soumis le : mercredi 29 mai 2024-09:57:58

Dernière modification le : mercredi 30 octobre 2024-13:32:51

Dates et versions

hal-04591366 , version 1 (29-05-2024)

Licence

Paternité

Identifiants

HAL Id : hal-04591366 , version 1

Citer

Bianca Marin Moreno, Margaux Brégère, Pierre Gaillard, Nadia Oudjane. MetaCURL: Non-stationary Concave Utility Reinforcement Learning. 2024. ⟨hal-04591366⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA UNIV-DAUPHINE INSMI LJK LJK_GI INRIA2 LJK-GI-THOTH PSL EDF LPSM SORBONNE-UNIVERSITE SU-SCIENCES UP-SCIENCES

62 Consultations

53 Téléchargements

MetaCURL: Non-stationary Concave Utility Reinforcement Learning

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager