TY - GEN
T1 - Towards a Scalable and Efficient PGAS-Based Distributed OpenMP
AU - Shan, Baodi
AU - Araya-Polo, Mauricio
AU - Chapman, Barbara
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024
Y1 - 2024
N2 - MPI+X has been the de facto standard for distributed memory parallel programming. It is widely used primarily as an explicit two-sided communication model, which often leads to complex and error-prone code. Alternatively, PGAS model utilizes efficient one-sided communication and more intuitive communication primitives. In this paper, we present a novel approach that integrates PGAS concepts into the OpenMP programming model, leveraging the LLVM compiler infrastructure and the GASNet-EX communication library. Our model addresses the complexity associated with traditional MPI+OpenMP programming models while ensuring excellent performance and scalability. We evaluate our approach using a set of micro-benchmarks and application kernels on two distinct platforms: Ookami from Stony Brook University and NERSC Perlmutter. The results demonstrate that DiOMP achieves superior bandwidth and lower latency compared to MPI+OpenMP, up to 25% higher bandwidth and down to 45% on latency. DiOMP offers a promising alternative to the traditional MPI+OpenMP hybrid programming model, towards providing a more productive and efficient way to develop high-performance parallel applications for distributed memory systems.
AB - MPI+X has been the de facto standard for distributed memory parallel programming. It is widely used primarily as an explicit two-sided communication model, which often leads to complex and error-prone code. Alternatively, PGAS model utilizes efficient one-sided communication and more intuitive communication primitives. In this paper, we present a novel approach that integrates PGAS concepts into the OpenMP programming model, leveraging the LLVM compiler infrastructure and the GASNet-EX communication library. Our model addresses the complexity associated with traditional MPI+OpenMP programming models while ensuring excellent performance and scalability. We evaluate our approach using a set of micro-benchmarks and application kernels on two distinct platforms: Ookami from Stony Brook University and NERSC Perlmutter. The results demonstrate that DiOMP achieves superior bandwidth and lower latency compared to MPI+OpenMP, up to 25% higher bandwidth and down to 45% on latency. DiOMP offers a promising alternative to the traditional MPI+OpenMP hybrid programming model, towards providing a more productive and efficient way to develop high-performance parallel applications for distributed memory systems.
KW - Distributed Computing
KW - MPI
KW - OpenMP
KW - PGAS
UR - https://www.scopus.com/pages/publications/85205392261
U2 - 10.1007/978-3-031-72567-8_5
DO - 10.1007/978-3-031-72567-8_5
M3 - Conference contribution
AN - SCOPUS:85205392261
SN - 9783031725661
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 64
EP - 78
BT - Advancing OpenMP for Future Accelerators - 20th International Workshop on OpenMP, IWOMP 2024, Proceedings
A2 - Espinosa, Alexis
A2 - Cytowski, Maciej
A2 - Klemm, Michael
A2 - de Supinski, Bronis R.
A2 - Klinkenberg, Jannis
PB - Springer Science and Business Media Deutschland GmbH
T2 - 20th International Workshop on OpenMP, IWOMP 2024
Y2 - 23 September 2024 through 25 September 2024
ER -