Skip to main navigation Skip to search Skip to main content

On-line control methods via simulation

  • Hyeong Soo Chang
  • , Jiaqiao Hu
  • , Michael C. Fu
  • , Steven I. Marcus
  • Sogang University
  • University of Maryland, College Park

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

In Chap. 5, we consider an approximate rolling-horizon control framework for solving infinite-horizon MDPs with large state/action spaces in an on-line manner by simulation. Specifically, we consider policies in which the system (either the actual system itself or a simulation model of the system) evolves to a particular state that is observed, and the action to be taken in that particular state is then computed on-line at the decision time, with a particular emphasis on the use of simulation. We first present an updating scheme involving multiplicative weights for updating a probability distribution over a restricted set of policies; this scheme can be used to estimate the optimal value function over this restricted set by sampling on the (restricted) policy space. The lower-bound estimate of the optimal value function is used for constructing on-line control policies, called (simulated) policy switching and parallel rollout. We also discuss an upper-bound-based method, called hindsight optimization. Finally, we present an algorithm, called approximate stochastic annealing, which combines Q-learning with the MARS algorithm of Sect. 4.6.1 to directly search the policy space.

Original languageEnglish
Title of host publicationCommunications and Control Engineering
PublisherSpringer International Publishing
Pages179-218
Number of pages40
Edition9781447150213
DOIs
StatePublished - 2013

Publication series

NameCommunications and Control Engineering
Number9781447150213
ISSN (Print)0178-5354
ISSN (Electronic)2197-7119

Keywords

  • Expense
  • Peha

Fingerprint

Dive into the research topics of 'On-line control methods via simulation'. Together they form a unique fingerprint.

Cite this