Skip to main navigation Skip to search Skip to main content

A Method Integrating Q-Learning with Approximate Dynamic Programming for Gantry Work Cell Scheduling

  • Stony Brook University
  • University of Virginia

Research output: Contribution to journalArticlepeer-review

27 Scopus citations

Abstract

This article formulates gantry real-time scheduling in a gantry work cell, where the material transfer is driven by gantries, as a Markov decision process (MDP). Classical learning methods and planning methods for solving the optimization problems in MDP are discussed. An innovative method, called 'Q-ADP,' is proposed to integrate reinforcement learning (RL) with approximate dynamic programming (ADP). Q-ADP uses model-free Q-learning algorithm to learn state values through interactions with the environment, meanwhile, planning steps during the learning process opt for ADP to keep updating state values through several sample paths. A model of one-step transition probabilities is built based on the machines' reliability model, and serves the ADP algorithm. To demonstrate the effectiveness of this method, a numerical study is performed to show the production performance, compared to a standard Q-learning algorithm. The simulation results show that Q-ADP outperforms standard Q-learning under the same length of training process. It is also shown that with the benefit of repeated updating state values through sample paths, Q-ADP requires less data for gantry policy to converge, which makes the method promising when real data are limited. Note to Practitioners - The goal of this work is to find a near optimal gantry assignment policy to realize real-time control of material handling gantry/robot movements in gantry work cells. Properly assigning gantries based on real-time situations of the production system can avoid machines' stoppage due to material shortage, and consequently improve production performance. This gantry scheduling is a sequential decision-making problem and can be presented by Markov Decision Process (MDP). To solve the MDP problem, an algorithm integrating model-free Q-learning and model-based approximate dynamic programming (ADP) is proposed. By learning directly from the interaction with the environment, the method avoids bias problem from any model designing. Meanwhile, a planning process during learning can efficiently speed up the learning for convergence of the policy, and this particularly benefits to the scenario when the real data are insufficient.

Original languageEnglish
Article number9069276
Pages (from-to)85-93
Number of pages9
JournalIEEE Transactions on Automation Science and Engineering
Volume18
Issue number1
DOIs
StatePublished - Jan 2021

Keywords

  • Approximate dynamic programming (ADP)
  • Markov decision process (MDP)
  • Q-learning
  • gantry scheduling
  • planning and learning

Fingerprint

Dive into the research topics of 'A Method Integrating Q-Learning with Approximate Dynamic Programming for Gantry Work Cell Scheduling'. Together they form a unique fingerprint.

Cite this