TY - GEN
T1 - Convergence of value iterations for total-cost MDPs and POMDPs with general state and action sets
AU - Feinberg, Eugene A.
AU - Kasyanov, Pavlo O.
AU - Zgurovsky, Michael Z.
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/1/14
Y1 - 2014/1/14
N2 - This paper describes conditions for convergence to optimal values of the dynamic programming algorithm applied to total-cost Markov Decision Processes (MDPSs) with Borel state and action sets and with possibly unbounded one-step cost functions. It also studies applications of these results to Partially Observable MDPs (POMDPs). It is well-known that POMDPs can be reduced to special MDPs, called Completely Observable MDPs (COMDPs), whose state spaces are sets of probabilities of the original states. This paper describes conditions on POMDPs under which optimal policies for COMDPs can be found by value iteration. In other words, this paper provides sufficient conditions for solving total-costs POMDPs with infinite state, observation and action sets by dynamic programming. Examples of applications to filtration, identification, and inventory control are provided.
AB - This paper describes conditions for convergence to optimal values of the dynamic programming algorithm applied to total-cost Markov Decision Processes (MDPSs) with Borel state and action sets and with possibly unbounded one-step cost functions. It also studies applications of these results to Partially Observable MDPs (POMDPs). It is well-known that POMDPs can be reduced to special MDPs, called Completely Observable MDPs (COMDPs), whose state spaces are sets of probabilities of the original states. This paper describes conditions on POMDPs under which optimal policies for COMDPs can be found by value iteration. In other words, this paper provides sufficient conditions for solving total-costs POMDPs with infinite state, observation and action sets by dynamic programming. Examples of applications to filtration, identification, and inventory control are provided.
UR - https://www.scopus.com/pages/publications/84946686373
U2 - 10.1109/ADPRL.2014.7010613
DO - 10.1109/ADPRL.2014.7010613
M3 - Conference contribution
AN - SCOPUS:84946686373
T3 - IEEE SSCI 2014 - 2014 IEEE Symposium Series on Computational Intelligence - ADPRL 2014: 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Proceedings
BT - IEEE SSCI 2014 - 2014 IEEE Symposium Series on Computational Intelligence - ADPRL 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2014
Y2 - 9 December 2014 through 12 December 2014
ER -