Skip to main navigation Skip to search Skip to main content

Lifelong policy gradient learning of factored policies for faster training without forgetting

  • Western University
  • University of Pennsylvania

Research output: Contribution to journalConference articlepeer-review

27 Scopus citations

Abstract

Policy gradient methods have shown success in learning control policies for high-dimensional dynamical systems. Their biggest downside is the amount of exploration they require before yielding high-performing policies. In a lifelong learning setting, in which an agent is faced with multiple consecutive tasks over its lifetime, reusing information from previously seen tasks can substantially accelerate the learning of new tasks. We provide a novel method for lifelong policy gradient learning that trains lifelong function approximators directly via policy gradients, allowing the agent to benefit from accumulated knowledge throughout the entire training process. We show empirically that our algorithm learns faster and converges to better policies than single-task and lifelong learning baselines, and completely avoids catastrophic forgetting on a variety of challenging domains.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume2020-December
StatePublished - 2020
Event34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online
Duration: Dec 6 2020Dec 12 2020

Fingerprint

Dive into the research topics of 'Lifelong policy gradient learning of factored policies for faster training without forgetting'. Together they form a unique fingerprint.

Cite this