Skip to main navigation Skip to search Skip to main content

Constrained discounted dynamic programming

  • Technion-Israel Institute of Technology

Research output: Contribution to journalArticlepeer-review

71 Scopus citations

Abstract

This paper deals with constrained optimization of Markov Decision Processes with a countable state space, compact action sets, continuous transition probabilities, and upper semicontinuous reward functions. The objective is to maximize the expected total discounted reward for one reward function, under several inequality constraints on similar criteria with other reward functions. Suppose a feasible policy exists for a problem with M constraints. We prove two results on the existence and structure of optimal policies. First, we show that there exists a randomized stationary optimal policy which requires at most M actions more than a nonrandomized stationary one. This result is known for several particular cases. Second, we prove that there exists an optimal policy which is (i) stationary (nonrandomized) from some step onward, (ii) randomized Markov before this step, but the total number of actions which are added by randomization is at most M, (iii) the total number of actions that are added by nonstationarity is at most M. We also establish Pareto optimality of policies from the two classes described above for multi-criteria problems. We describe an algorithm to compute optimal policies with properties (i)-(iii) for constrained problems. The policies that satisfy properties (i)-(iii) have the pleasing aesthetic property that the amount of randomization they require over any trajectory is restricted by the number of constraints. In contrast, a randomized stationary policy may require an infinite number of randomizations over time.

Original languageEnglish
Pages (from-to)922-945
Number of pages24
JournalMathematics of Operations Research
Volume21
Issue number4
DOIs
StatePublished - Nov 1996

Keywords

  • Additional constraints
  • Discounted Markov decision processes
  • Dynamic programming

Fingerprint

Dive into the research topics of 'Constrained discounted dynamic programming'. Together they form a unique fingerprint.

Cite this