What is an Optimal Policy in Time-Average MDP? - Systèmes intelligents pour les données, les connaissances et les humains
Communication Dans Un Congrès Année : 2023

What is an Optimal Policy in Time-Average MDP?

Résumé

This paper discusses the notion of optimality for time-average MDPs. We argue that while most authors claim to use the "average reward" criteria, the notion that is implicitly used is in fact the notion of what we call Bellman optimality. We show that it does not coincide with other existing notions of optimality, like gain-optimality and bias-optimality but has strong connection with canonical-policies (policies that are optimal for any finite horizons) as well as value iteration and policy iterations algorithms.
Fichier principal
Vignette du fichier
mama2023_bellmanopt.pdf (210.05 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04696993 , version 1 (13-09-2024)

Licence

Identifiants

Citer

Nicolas Gast, Bruno Gaujal, Kimang Khun. What is an Optimal Policy in Time-Average MDP?. ACM SIGMETRICS Workshop MAMA, Jun 2023, Orlando (FL), United States. pp.30-32, ⟨10.1145/3626570.3626582⟩. ⟨hal-04696993⟩
45 Consultations
21 Téléchargements

Altmetric

Partager

More