TY - GEN
T1 - Evaluating Tuning Opportunities of the LLVM/OpenMP Runtime
AU - Chheda, Smeet
AU - Verma, Gaurav
AU - Tian, Shilei
AU - Chapman, Barbara
AU - Doerfert, Johannes
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Tuning parallel applications on multi-core architectures is an arduous task. Several studies have utilized auto-tuning for OpenMP applications via standardized user-facing features, namely number of threads, thread placement, binding and scheduling policy. However, they fall short on utilizing the additional parameters provided by an OpenMP implementation. In this paper, we analyze OpenMP application runtime through an exhaustive exploration of all relevant configuration options of the LLVM/OpenMP runtime.Our findings allow to identify trends in tuning potential, architecture-aware tuning suggestions, and good default configurations per architecture. We will open-source the 240,000 unique samples collected during experiments for use by the community. These runs have been conducted on three different CPU architectures vital in the HPC and datacenter community. Choice of applications includes popular benchmark suites and microbench-marks namely, NAS Parallel Benchmarks, Barcelona OpenMP Task Suite, XSBench, RSBench, SU3Bench and LULESH.We employ the Linear Models class of Machine Learning algorithms to perform analysis, explain, and form qualitative relations between features comprising of the underlying architecture, application, input size, number of threads, and considered environment variables. This is further used to recommend different configurations given an application type/architecture.
AB - Tuning parallel applications on multi-core architectures is an arduous task. Several studies have utilized auto-tuning for OpenMP applications via standardized user-facing features, namely number of threads, thread placement, binding and scheduling policy. However, they fall short on utilizing the additional parameters provided by an OpenMP implementation. In this paper, we analyze OpenMP application runtime through an exhaustive exploration of all relevant configuration options of the LLVM/OpenMP runtime.Our findings allow to identify trends in tuning potential, architecture-aware tuning suggestions, and good default configurations per architecture. We will open-source the 240,000 unique samples collected during experiments for use by the community. These runs have been conducted on three different CPU architectures vital in the HPC and datacenter community. Choice of applications includes popular benchmark suites and microbench-marks namely, NAS Parallel Benchmarks, Barcelona OpenMP Task Suite, XSBench, RSBench, SU3Bench and LULESH.We employ the Linear Models class of Machine Learning algorithms to perform analysis, explain, and form qualitative relations between features comprising of the underlying architecture, application, input size, number of threads, and considered environment variables. This is further used to recommend different configurations given an application type/architecture.
KW - HPC
KW - machine learning
KW - parallel programming
KW - tuning
UR - https://www.scopus.com/pages/publications/85217184290
U2 - 10.1109/SCW63240.2024.00131
DO - 10.1109/SCW63240.2024.00131
M3 - Conference contribution
AN - SCOPUS:85217184290
T3 - Proceedings of SC 2024-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis
SP - 919
EP - 929
BT - Proceedings of SC 2024-W
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024
Y2 - 17 November 2024 through 22 November 2024
ER -