TY - GEN
T1 - Toward Resilient Multi-Agent Actor-Critic Algorithms for Distributed Reinforcement Learning
AU - Lin, Yixuan
AU - Gade, Shripad
AU - Sandhu, Romeil
AU - Liu, Ji
N1 - Publisher Copyright:
© 2020 AACC.
PY - 2020/7
Y1 - 2020/7
N2 - This paper considers a distributed reinforcement learning problem in the presence of Byzantine agents. The system consists of a central coordinating authority called "master agent" and multiple computational entities called "worker agents". The master agent is assumed to be reliable, while, a small fraction of the workers can be Byzantine (malicious) adversaries. The workers are interested in cooperatively maximize a convex combination of the honest (non-malicious) worker agents' long-term returns through communication between the master agent and worker agents. A distributed actor-critic algorithm is studied which makes use of entry-wise trimmed mean. The algorithm's communication-efficiency is improved by allowing the worker agents to send only a scalar-valued variable to the master agent, instead of the entire parameter vector, at each iteration. The improved algorithm involves computing a trimmed mean over only the received scalar-valued variable. It is shown that both algorithms converge almost surely.
AB - This paper considers a distributed reinforcement learning problem in the presence of Byzantine agents. The system consists of a central coordinating authority called "master agent" and multiple computational entities called "worker agents". The master agent is assumed to be reliable, while, a small fraction of the workers can be Byzantine (malicious) adversaries. The workers are interested in cooperatively maximize a convex combination of the honest (non-malicious) worker agents' long-term returns through communication between the master agent and worker agents. A distributed actor-critic algorithm is studied which makes use of entry-wise trimmed mean. The algorithm's communication-efficiency is improved by allowing the worker agents to send only a scalar-valued variable to the master agent, instead of the entire parameter vector, at each iteration. The improved algorithm involves computing a trimmed mean over only the received scalar-valued variable. It is shown that both algorithms converge almost surely.
UR - https://www.scopus.com/pages/publications/85089562669
U2 - 10.23919/ACC45564.2020.9147381
DO - 10.23919/ACC45564.2020.9147381
M3 - Conference contribution
AN - SCOPUS:85089562669
T3 - Proceedings of the American Control Conference
SP - 3953
EP - 3958
BT - 2020 American Control Conference, ACC 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 American Control Conference, ACC 2020
Y2 - 1 July 2020 through 3 July 2020
ER -