Skip to main navigation Skip to search Skip to main content

Energy-efficient GPU SM allocation

  • Stony Brook University

Research output: Contribution to journalArticlepeer-review

Abstract

GPU sharing between workloads is an e!ective approach to increase GPU utilization and reduce idle power waste. To minimize resource contention under GPU sharing, current architectures allow users to allocate core GPU compute resources exclusively to workloads. However, identifying the most e”cient GPU compute resource allocation for colocated workloads is challenging, as it requires balancing potential performance degradation and power savings. This paper presents a framework for finding the most energy-e”cient compute allocation for colocated workload pairs under NVIDIA MPS using lightweight prediction models. Experimental results, using a range of training, inference, and general CUDA workloads, demonstrate that our solution outperforms the equal sharing strategy by 35%, on average, and is within 1.5% of the o#ine optimal strategy.

Original languageEnglish
Pages (from-to)33-38
Number of pages6
JournalPerformance Evaluation Review
Volume53
Issue number2
DOIs
StatePublished - Aug 27 2025

Fingerprint

Dive into the research topics of 'Energy-efficient GPU SM allocation'. Together they form a unique fingerprint.

Cite this