Skip to main navigation Skip to search Skip to main content

Adaptive adversarial multi-armed bandit approach to two-person zero-sum markov games

  • Hyeong Soo Chang
  • , Jiaqiao Hu
  • , Michael C. Fu
  • , Steven I. Marcus
  • Sogang University
  • University of Maryland, College Park

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

This technical note presents a recursive sampling-based algorithm for finite horizon two-person zero-sum Markov games (MGs) based on the Exp3 algorithm developed by Auer et al. for adaptive adversarial multi-armed bandit problems. We provide a finite-iteration bound to the equilibrium value of the induced sample average approximation game of a given MG and prove asymptotic convergence to the equilibrium value of the given MG. The time and space complexities of the algorithm are independent of the state space of the game.

Original languageEnglish
Article number5378483
Pages (from-to)463-468
Number of pages6
JournalIEEE Transactions on Automatic Control
Volume55
Issue number2
DOIs
StatePublished - Feb 2010

Keywords

  • Multi-armed bandit
  • Sample average approximation
  • Sampling
  • Two-person zero-sum Markov game (MG)

Fingerprint

Dive into the research topics of 'Adaptive adversarial multi-armed bandit approach to two-person zero-sum markov games'. Together they form a unique fingerprint.

Cite this