Skip to main navigation Skip to search Skip to main content

Convergence of value iterations for total-cost MDPs and POMDPs with general state and action sets

  • National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

This paper describes conditions for convergence to optimal values of the dynamic programming algorithm applied to total-cost Markov Decision Processes (MDPSs) with Borel state and action sets and with possibly unbounded one-step cost functions. It also studies applications of these results to Partially Observable MDPs (POMDPs). It is well-known that POMDPs can be reduced to special MDPs, called Completely Observable MDPs (COMDPs), whose state spaces are sets of probabilities of the original states. This paper describes conditions on POMDPs under which optimal policies for COMDPs can be found by value iteration. In other words, this paper provides sufficient conditions for solving total-costs POMDPs with infinite state, observation and action sets by dynamic programming. Examples of applications to filtration, identification, and inventory control are provided.

Original languageEnglish
Title of host publicationIEEE SSCI 2014 - 2014 IEEE Symposium Series on Computational Intelligence - ADPRL 2014
Subtitle of host publication2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781479945535
DOIs
StatePublished - Jan 14 2014
Event2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2014 - Orlando, United States
Duration: Dec 9 2014Dec 12 2014

Publication series

NameIEEE SSCI 2014 - 2014 IEEE Symposium Series on Computational Intelligence - ADPRL 2014: 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Proceedings

Conference

Conference2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2014
Country/TerritoryUnited States
CityOrlando
Period12/9/1412/12/14

Fingerprint

Dive into the research topics of 'Convergence of value iterations for total-cost MDPs and POMDPs with general state and action sets'. Together they form a unique fingerprint.

Cite this