Skip to main navigation Skip to search Skip to main content

A multi-agent off-policy actor-critic algorithm for distributed reinforcement learning

  • Wesley Suttle
  • , Zhuoran Yang
  • , Kaiqing Zhang
  • , Zhaoran Wang
  • , Tamer Basar
  • , Ji Liu
  • Stony Brook University
  • Princeton University
  • University of Illinois at Urbana-Champaign
  • Northwestern University

Research output: Contribution to journalConference articlepeer-review

36 Scopus citations

Abstract

This paper extends off-policy reinforcement learning to the multi-agent case in which a set of networked agents communicating with their neighbors according to a time-varying graph collaboratively evaluates and improves a target policy while following a distinct behavior policy. To this end, the paper develops a multi-agent version of emphatic temporal difference learning for off-policy policy evaluation, and proves convergence under linear function approximation. The paper then leverages this result, in conjunction with a novel multi-agent off-policy policy gradient theorem and recent work in both multi-agent on-policy and single-agent off-policy actor-critic methods, to develop and give convergence guarantees for a new multi-agent off-policy actor-critic algorithm. An empirical validation of these theoretical results is given.

Original languageEnglish
Pages (from-to)1549-1554
Number of pages6
JournalIFAC-PapersOnLine
Volume53
DOIs
StatePublished - 2020
Event21st IFAC World Congress 2020 - Berlin, Germany
Duration: Jul 12 2020Jul 17 2020

Keywords

  • Adaptive control of multi-agent systems
  • Consensus and reinforcement learning control

Fingerprint

Dive into the research topics of 'A multi-agent off-policy actor-critic algorithm for distributed reinforcement learning'. Together they form a unique fingerprint.

Cite this