Skip to main navigation Skip to search Skip to main content

AlloX: Compute allocation in hybrid clusters

  • Stony Brook University
  • University of Michigan, Ann Arbor

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

86 Scopus citations

Abstract

Modern deep learning frameworks support a variety of hardware, including CPU, GPU, and other accelerators, to perform computation. In this paper, we study how to schedule jobs over such interchangeable resources-each with a different rate of computation-to optimize performance while providing fairness among users in a shared cluster. We demonstrate theoretically and empirically that existing solutions and their straightforward modifications perform poorly in the presence of interchangeable resources, which motivates the design and implementation of AlloX. At its core, AlloX transforms the scheduling problem into a min-cost bipartite matching problem and provides dynamic fair allocation over time. We theoretically prove its optimality in an ideal, offline setting and show empirically that it works well in the online scenario by incorporating with Kubernetes. Evaluations on a small-scale CPU-GPU hybrid cluster and large-scale simulations highlight that AlloX can reduce the average job completion time significantly (by up to 95% when the system load is high) while providing fairness and preventing starvation.

Original languageEnglish
Title of host publicationProceedings of the 15th European Conference on Computer Systems, EuroSys 2020
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450368827
DOIs
StatePublished - Apr 17 2020
Event15th European Conference on Computer Systems, EuroSys 2020 - Virtual, Online, Greece
Duration: Apr 27 2020Apr 30 2020

Publication series

NameProceedings of the 15th European Conference on Computer Systems, EuroSys 2020

Conference

Conference15th European Conference on Computer Systems, EuroSys 2020
Country/TerritoryGreece
CityVirtual, Online
Period04/27/2004/30/20

Fingerprint

Dive into the research topics of 'AlloX: Compute allocation in hybrid clusters'. Together they form a unique fingerprint.

Cite this