Skip to main navigation Skip to search Skip to main content

A Distributed Algorithm for Sequential Decision Making in Multi-Armed Bandit with Homogeneous Rewards*

  • Jingxuan Zhu
  • , Romeil Sandhu
  • , Ji Liu
  • Stony Brook University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Scopus citations

Abstract

This paper studies a distributed multi-armed bandit problem over a network of N agents, each of which can communicate only with its neighbors, where neighbor relationships are described by a connected graph \mathbb{G}. Each agent makes a sequence of decisions on selecting an arm from M candidates, yet it only has access to local samples of the reward for each action, which is a random variable. A distributed upper confidence bound (UCB) algorithm is proposed for the agents to cooperatively learn the best decision. It is shown that when all the agents share a homogeneous distribution of each arm reward, the algorithm achieves guaranteed logarithmic regret for all N agents at the order of O((1 + 2?2)2 logT/N) when T is large, where ?2 denotes the second largest among the absolute values of all the eigenvalues of the Metropolis matrix of \mathbb{G}. A sufficient condition under which the proposed distributed algorithm learns faster than the centralized (single-agent) counterpart is provided. Simulations suggest that the algorithm also works for the case when the agents have heterogeneous observations of each arm reward.

Original languageEnglish
Title of host publication2020 59th IEEE Conference on Decision and Control, CDC 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3078-3083
Number of pages6
ISBN (Electronic)9781728174471
DOIs
StatePublished - Dec 14 2020
Event59th IEEE Conference on Decision and Control, CDC 2020 - Virtual, Jeju Island, Korea, Republic of
Duration: Dec 14 2020Dec 18 2020

Publication series

NameProceedings of the IEEE Conference on Decision and Control
Volume2020-December
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Conference

Conference59th IEEE Conference on Decision and Control, CDC 2020
Country/TerritoryKorea, Republic of
CityVirtual, Jeju Island
Period12/14/2012/18/20

Fingerprint

Dive into the research topics of 'A Distributed Algorithm for Sequential Decision Making in Multi-Armed Bandit with Homogeneous Rewards*'. Together they form a unique fingerprint.

Cite this