Skip to main navigation Skip to search Skip to main content

Multi-stage adaptive sampling algorithms

  • Hyeong Soo Chang
  • , Jiaqiao Hu
  • , Michael C. Fu
  • , Steven I. Marcus
  • Sogang University
  • University of Maryland, College Park

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

In Chap. 2, we present simulation-based algorithms for estimating the optimal value function in finite-horizon MDPs with large (possibly uncountable) state spaces, where the usual techniques of policy iteration and value iteration are either computationally impractical or infeasible to implement. We present two adaptive sampling algorithms that estimate the optimal value function by choosing actions to sample in each state visited on a finite-horizon simulated sample path. The first approach builds upon the expected regret analysis of multi-armed bandit models and uses upper confidence bounds to determine which action to sample next, whereas the second approach uses ideas from learning automata to determine the next sampled action. The first approach is also the predecessor of a closely related approach in artificial intelligence (AI) called Monte Carlo tree search that led to a breakthrough in developing the current best computer Go-playing programs.

Original languageEnglish
Title of host publicationCommunications and Control Engineering
PublisherSpringer International Publishing
Pages19-60
Number of pages42
Edition9781447150213
DOIs
StatePublished - 2013

Publication series

NameCommunications and Control Engineering
Number9781447150213
ISSN (Print)0178-5354
ISSN (Electronic)2197-7119

Keywords

  • Expense
  • Lost

Fingerprint

Dive into the research topics of 'Multi-stage adaptive sampling algorithms'. Together they form a unique fingerprint.

Cite this