Abstract
This paper deals with constrained average reward Semi-Markov Decision Processes (SMDPs) with finite state and action sets. We consider two average reward criteria. The first criterion is time-average rewards, which equal the lower limits of the expected average rewards per unit time, as the horizon tends to infinity. The second criterion is ratio-average rewards, which equal the lower limits of the ratios of the expected total rewards during the first n steps to the expected total duration of these n steps as n → ∞. For both criteria, we prove the existence of optimal mixed stationary policies for constrained problems when the constraints are of the same nature as the objective functions. For unichain problems, we show the existence of randomized stationary policies which are optimal for both criteria. However, optimal mixed stationary policies may be different for each of these critria even for unichain problems. We provide linear programming algorithms for the computation of optimal policies.
| Original language | English |
|---|---|
| Pages (from-to) | 257-288 |
| Number of pages | 32 |
| Journal | ZOR. Zeitschrift fur Operations-Research |
| Volume | 39 |
| Issue number | 3 |
| DOIs | |
| State | Published - Oct 1994 |
Keywords
- average reward
- constrained optimization
- Semi-Markov decision process
Fingerprint
Dive into the research topics of 'Constrained Semi-Markov decision processes with average rewards'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver