Skip to main navigation Skip to search Skip to main content

On the convergence of optimal actions for Markov decision processes and the optimality of (s, S) inventory policies

  • Cornell University

Research output: Contribution to journalArticlepeer-review

17 Scopus citations

Abstract

This article studies convergence properties of optimal values and actions for discounted and average-cost Markov decision processes (MDPs) with weakly continuous transition probabilities and applies these properties to the stochastic periodic-review inventory control problem with backorders, positive setup costs, and convex holding/backordering costs. The following results are established for MDPs with possibly non-compact action sets and unbounded cost functions: (i) convergence of value iterations to optimal values for discounted problems with possibly non-zero terminal costs, (ii) convergence of optimal finite-horizon actions to optimal infinite-horizon actions for total discounted costs, as the time horizon tends to infinity, and (iii) convergence of optimal discount-cost actions to optimal average-cost actions for infinite-horizon problems, as the discount factor tends to 1. Being applied to the setup-cost inventory control problem, the general results on MDPs imply the optimality of (s, S) policies and convergence properties of optimal thresholds. In particular this article analyzes the setup-cost inventory control problem without two assumptions often used in the literature: (a) the demand is either discrete or continuous or (b) the backordering cost is higher than the cost of backordered inventory if the amount of backordered inventory is large.

Original languageEnglish
Pages (from-to)619-637
Number of pages19
JournalNaval Research Logistics
Volume65
Issue number8
DOIs
StatePublished - Dec 2018

Keywords

  • average cost per unit time
  • inventory control
  • Markov decision process
  • optimal policy
  • optimality inequality

Fingerprint

Dive into the research topics of 'On the convergence of optimal actions for Markov decision processes and the optimality of (s, S) inventory policies'. Together they form a unique fingerprint.

Cite this