TY - GEN
T1 - Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits
AU - Xiong, Guojun
AU - Li, Jian
N1 - Publisher Copyright:
Copyright © 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2023/6/27
Y1 - 2023/6/27
N2 - Multi-player multi-armed bandit is an increasingly relevant decision-making problem, motivated by applications to cognitive radio systems. Most research for this problem focuses exclusively on the settings that players have full access to all arms and receive no reward when pulling the same arm. Hence all players solve the same bandit problem with the goal of maximizing their cumulative reward. However, these settings neglect several important factors in many real-world applications, where players have limited access to a dynamic local subset of arms (i.e., an arm could sometimes be “walking” and not accessible to the player). To this end, this paper proposes a multi-player multi-armed walking bandits model, aiming to address aforementioned modeling issues. The goal now is to maximize the reward, however, players can only pull arms from the local subset and only collect a full reward if no other players pull the same arm. We adopt Upper Confidence Bound (UCB) to deal with the exploration-exploitation tradeoff and employ distributed optimization techniques to properly handle collisions. By carefully integrating these two techniques, we propose a decentralized algorithm with near-optimal guarantee on the regret, and can be easily implemented to obtain competitive empirical performance.
AB - Multi-player multi-armed bandit is an increasingly relevant decision-making problem, motivated by applications to cognitive radio systems. Most research for this problem focuses exclusively on the settings that players have full access to all arms and receive no reward when pulling the same arm. Hence all players solve the same bandit problem with the goal of maximizing their cumulative reward. However, these settings neglect several important factors in many real-world applications, where players have limited access to a dynamic local subset of arms (i.e., an arm could sometimes be “walking” and not accessible to the player). To this end, this paper proposes a multi-player multi-armed walking bandits model, aiming to address aforementioned modeling issues. The goal now is to maximize the reward, however, players can only pull arms from the local subset and only collect a full reward if no other players pull the same arm. We adopt Upper Confidence Bound (UCB) to deal with the exploration-exploitation tradeoff and employ distributed optimization techniques to properly handle collisions. By carefully integrating these two techniques, we propose a decentralized algorithm with near-optimal guarantee on the regret, and can be easily implemented to obtain competitive empirical performance.
UR - https://www.scopus.com/pages/publications/85168240197
U2 - 10.1609/aaai.v37i9.26251
DO - 10.1609/aaai.v37i9.26251
M3 - Conference contribution
AN - SCOPUS:85168240197
T3 - Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023
SP - 10528
EP - 10536
BT - AAAI-23 Technical Tracks 9
A2 - Williams, Brian
A2 - Chen, Yiling
A2 - Neville, Jennifer
PB - AAAI Press
T2 - 37th AAAI Conference on Artificial Intelligence, AAAI 2023
Y2 - 7 February 2023 through 14 February 2023
ER -