TY - GEN
T1 - DISTORTION RISK MEASURE-BASED DEEP REINFORCEMENT LEARNING
AU - Jiang, Jinyang
AU - Heidergott, Bernd
AU - Hu, Jiaqiao
AU - Peng, Yijie
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Mainstream reinforcement learning (RL) typically focuses on maximizing expected cumulative rewards. In this paper, we explore a risk-sensitive RL setting where the objective is to optimize the distortion risk measure (DRM), a criterion better reflecting human risk perception. We parameterize the action selection policy by neural networks and propose a novel policy gradient algorithm, DRM-based Policy Optimization (DPO), along with its accelerated variant, DRM-based Proximal Policy Optimization (DPPO), to address deep RL tasks with DRM objectives. DPO integrates three coupled recursions operating at different timescales to estimate gradient components and update parameters simultaneously. Our experiments provide numerical results across diverse scenarios, demonstrating that our proposed algorithms outperform the existing baselines under the DRM criterion.
AB - Mainstream reinforcement learning (RL) typically focuses on maximizing expected cumulative rewards. In this paper, we explore a risk-sensitive RL setting where the objective is to optimize the distortion risk measure (DRM), a criterion better reflecting human risk perception. We parameterize the action selection policy by neural networks and propose a novel policy gradient algorithm, DRM-based Policy Optimization (DPO), along with its accelerated variant, DRM-based Proximal Policy Optimization (DPPO), to address deep RL tasks with DRM objectives. DPO integrates three coupled recursions operating at different timescales to estimate gradient components and update parameters simultaneously. Our experiments provide numerical results across diverse scenarios, demonstrating that our proposed algorithms outperform the existing baselines under the DRM criterion.
UR - https://www.scopus.com/pages/publications/85217620140
U2 - 10.1109/WSC63780.2024.10838808
DO - 10.1109/WSC63780.2024.10838808
M3 - Conference contribution
AN - SCOPUS:85217620140
T3 - Proceedings - Winter Simulation Conference
SP - 2595
EP - 2606
BT - 2024 Winter Simulation Conference, WSC 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 Winter Simulation Conference, WSC 2024
Y2 - 15 December 2024 through 18 December 2024
ER -