Skip to main navigation Skip to search Skip to main content

KACE: Kernel-Aware Colocation for Efficient GPU Spatial Sharing

  • Stony Brook University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

GPU spatial sharing among jobs is an effective approach to increase resource utilization and reduce the monetary and environmental costs of running deep learning workloads. While hardware support for GPU spatial sharing already exists, accurately predicting GPU interference between colocated workloads remains a concern. This makes it challenging to improve GPU utilization by sharing the GPU between workloads without severely impacting their performance. Existing approaches to identify and mitigate GPU interference often require extensive profiling and/or hardware modifications, making them difficult to deploy in practice. This paper presents KACE, a lightweight, prediction-based approach to effectively colocate workloads on a given GPU. KACE adequately predicts colocation interference via exclusive kernel metrics using limited training data and minimal training time, eliminating the need for extensive online profiling of each new workload colocation. Experimental results using various training and inference workloads show that KACE outperforms existing rule-based and prediction-based policies by 16% and 11%, on average, respectively, and is within 10% of the performance achieved by an offline-optimal oracle policy.

Original languageEnglish
Title of host publicationSoCC 2024 - Proceedings of the 2024 ACM Symposium on Cloud Computing
PublisherAssociation for Computing Machinery, Inc
Pages460-469
Number of pages10
ISBN (Electronic)9798400712869
DOIs
StatePublished - Nov 20 2024
Event15th Annual ACM Symposium on Cloud Computing, SoCC 2024 - Redmond, United States
Duration: Nov 20 2024Nov 22 2024

Publication series

NameSoCC 2024 - Proceedings of the 2024 ACM Symposium on Cloud Computing

Conference

Conference15th Annual ACM Symposium on Cloud Computing, SoCC 2024
Country/TerritoryUnited States
CityRedmond
Period11/20/2411/22/24

Keywords

  • Cloud Computing
  • GPU Sharing
  • Systems for ML

Fingerprint

Dive into the research topics of 'KACE: Kernel-Aware Colocation for Efficient GPU Spatial Sharing'. Together they form a unique fingerprint.

Cite this