Abstract
We propose an online algorithm for solving a class of continuous-state Markov decision processes. The algorithm combines classical Q-learning with an asynchronous averaging procedure, which allows Q-function estimates at sampled state–action pairs to be adaptively updated based on observations collected along a single sample trajectory. These estimates are then used to iteratively construct an interpolation-based function approximator of the Q-function. We prove the convergence of the algorithm and provide numerical results to illustrate its performance.
| Original language | English |
|---|---|
| Article number | 105782 |
| Journal | Systems and Control Letters |
| Volume | 187 |
| DOIs | |
| State | Published - May 2024 |
Keywords
- Markov processes
- Optimization algorithms
- Statistical learning
- Stochastic optimal control
Fingerprint
Dive into the research topics of 'A Q-learning algorithm for Markov decision processes with continuous state spaces'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver