Skip to main navigation Skip to search Skip to main content

DISTORTION RISK MEASURE-BASED DEEP REINFORCEMENT LEARNING

  • Jinyang Jiang
  • , Bernd Heidergott
  • , Jiaqiao Hu
  • , Yijie Peng
  • Peking University
  • Xiangjiang Laboratory
  • Vrije Universiteit Amsterdam

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Mainstream reinforcement learning (RL) typically focuses on maximizing expected cumulative rewards. In this paper, we explore a risk-sensitive RL setting where the objective is to optimize the distortion risk measure (DRM), a criterion better reflecting human risk perception. We parameterize the action selection policy by neural networks and propose a novel policy gradient algorithm, DRM-based Policy Optimization (DPO), along with its accelerated variant, DRM-based Proximal Policy Optimization (DPPO), to address deep RL tasks with DRM objectives. DPO integrates three coupled recursions operating at different timescales to estimate gradient components and update parameters simultaneously. Our experiments provide numerical results across diverse scenarios, demonstrating that our proposed algorithms outperform the existing baselines under the DRM criterion.

Original languageEnglish
Title of host publication2024 Winter Simulation Conference, WSC 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2595-2606
Number of pages12
ISBN (Electronic)9798331534202
DOIs
StatePublished - 2024
Event2024 Winter Simulation Conference, WSC 2024 - Orlando, United States
Duration: Dec 15 2024Dec 18 2024

Publication series

NameProceedings - Winter Simulation Conference
ISSN (Print)0891-7736

Conference

Conference2024 Winter Simulation Conference, WSC 2024
Country/TerritoryUnited States
CityOrlando
Period12/15/2412/18/24

Fingerprint

Dive into the research topics of 'DISTORTION RISK MEASURE-BASED DEEP REINFORCEMENT LEARNING'. Together they form a unique fingerprint.

Cite this