Skip to main navigation Skip to search Skip to main content

UNDERSTANDING CONSTRAINT INFERENCE IN SAFETY-CRITICAL INVERSE REINFORCEMENT LEARNING

  • Bo Yue
  • , Shufan Wang
  • , Ashish Gaurav
  • , Jian Li
  • , Pascal Poupart
  • , Guiliang Liu
  • The Chinese University of Hong Kong, Shenzhen
  • Stony Brook University
  • University of Waterloo
  • Vector Institute

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

In practical applications, the underlying constraint knowledge is often unknown and difficult to specify. To address this issue, recent advances in Inverse Constrained Reinforcement Learning (ICRL) have focused on inferring these constraints from expert demonstrations. However, the ICRL approach typically characterizes constraint learning as a tri-level optimization problem, which is inherently complex due to its interdependent variables and multiple layers of optimization. Considering these challenges, a critical question arises: Can we implicitly embed constraint signals into reward functions and effectively solve this problem using a classic reward inference algorithm? The resulting method, known as Inverse Reward Correction (IRC), merits investigation. In this work, we conduct a theoretical analysis comparing the sample complexities of both solvers. Our findings confirm that the IRC solver achieves lower sample complexity than its ICRL counterpart. Nevertheless, this reduction in complexity comes at the expense of generalizability. Specifically, in the target environment, the reward correction terms may fail to guarantee the safety of the resulting policy, whereas this issue can be effectively mitigated by transferring the cost functions via the ICRL solver. Advancing our inquiry, we investigate conditions under which the ICRL solver ensures ε-optimality when transferring to new environments. Empirical results across various environments validate our theoretical findings, underscoring the nuanced trade-offs between complexity reduction and generalizability in safety-critical applications.

Original languageEnglish
Title of host publication13th International Conference on Learning Representations, ICLR 2025
PublisherInternational Conference on Learning Representations, ICLR
Pages21162-21189
Number of pages28
ISBN (Electronic)9798331320850
StatePublished - 2025
Event13th International Conference on Learning Representations, ICLR 2025 - Singapore, Singapore
Duration: Apr 24 2025Apr 28 2025

Publication series

Name13th International Conference on Learning Representations, ICLR 2025

Conference

Conference13th International Conference on Learning Representations, ICLR 2025
Country/TerritorySingapore
CitySingapore
Period04/24/2504/28/25

Fingerprint

Dive into the research topics of 'UNDERSTANDING CONSTRAINT INFERENCE IN SAFETY-CRITICAL INVERSE REINFORCEMENT LEARNING'. Together they form a unique fingerprint.

Cite this