Skip to main navigation Skip to search Skip to main content

MPI-based Remote OpenMP Offloading: A More Efficient and Easy-to-use Implementation

  • Stony Brook University
  • Total S.A.

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

MPI+X is the most popular hybrid programming model for distributed computation on modern heterogeneous HPC systems. Nonetheless, for simplicity, HPC developers ideally would like to implement multi-node distributed parallel computing through a single coherent programming model. As de facto standard for parallel programming, OpenMP has been one of the most influential programming models in parallel computing. Recent work has proven that the OpenMP target offloading model could be used to program distributed accelerator-based HPC systems with marginal changes to the application. However, the UCX-based version of remote OpenMP offloading still has many limitations in terms of performance overhead and ease of use of the plugin.In this work, we have implemented a new MPI-based remote OpenMP offloading plugin. By comparing it with the UCX-based version, the new MPI-based plugin has been significantly improved in terms of performance, scalability, and ease of use. Evaluation of our work is conducted using one proxy-app, XSBench and an industrial-level seismic modeling code, Minimod. Results show that, compared to the optimized UCX-based version, our optimizations can reduce offloading latency by up to 70%, and raise application parallel efficiency by 68% when running with 16 GPUs on data-bound applications. In particular, the introduction of the concept of locality-aware offloading gives developers of HPC programs greater possibilities to take full advantage of modern hierarchical heterogeneous computing devices.

Original languageEnglish
Title of host publicationPMAM 2023 - Proceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores, Part of PPoPP 2023
PublisherAssociation for Computing Machinery, Inc
Pages50-59
Number of pages10
ISBN (Electronic)9798400701153
DOIs
StatePublished - Feb 25 2023
Event14th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2023 - Part of PPoPP 2023 - Montreal, Canada
Duration: Feb 26 2023Feb 26 2023

Publication series

NamePMAM 2023 - Proceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores, Part of PPoPP 2023

Conference

Conference14th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2023 - Part of PPoPP 2023
Country/TerritoryCanada
CityMontreal
Period02/26/2302/26/23

Keywords

  • distributed computing
  • GPGPU
  • OpenMP

Fingerprint

Dive into the research topics of 'MPI-based Remote OpenMP Offloading: A More Efficient and Easy-to-use Implementation'. Together they form a unique fingerprint.

Cite this