TY - GEN
T1 - Efficient Execution of Dynamic Programming Algorithms on Apache Spark
AU - Mahdi Javanmard, Mohammad
AU - Ahmad, Zafar
AU - Zola, Jaroslaw
AU - Pouchet, Louis Noel
AU - Chowdhury, Rezaul
AU - Harrison, Robert
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/9
Y1 - 2020/9
N2 - One of the most important properties of distributed computing systems (e.g., Apache Spark, Apache Hadoop, etc) on clusters and computation clouds is the ability to scale out by adding more compute nodes to the cluster. This important feature can lead to performance gain provided the computation (or the algorithm) itself can scale out. In other words, the computation (or the algorithm) should be easily decomposable into smaller units of work to be distributed among the workers based on the hardware/software configuration of the cluster or the cloud. Additionally, on such clusters, there is an important trade-off between communication cost, parallelism, and memory requirement. Due to the scalability need as well as this trade-off, it is crucial to have a well-decomposable, adaptive, tunable, and scalable program. Tunability enables the programmer to find an optimal point in the trade-off spectrum to execute the program efficiently on a specific cluster. We design and implement well-decomposable and tunable dynamic programming algorithms from the Gaussian Elimination Paradigm (GEP), such as Floyd-Warshall's all-pairs shortest path and Gaussian elimination without pivoting, for execution on Apache Spark. Our implementations are based on parametric multi-way recursive divide-conquer algorithms. We explain how to map implementations of those grid-based parallel algorithms to the Spark framework. Finally, we provide experimental results illustrating the performance, scalability, and portability of our Spark programs. We show that offloading the computation to an OpenMP environment (by running parallel recursive kernels) within Spark is at least partially responsible for a 2-5× speedup of the DP benchmarks.
AB - One of the most important properties of distributed computing systems (e.g., Apache Spark, Apache Hadoop, etc) on clusters and computation clouds is the ability to scale out by adding more compute nodes to the cluster. This important feature can lead to performance gain provided the computation (or the algorithm) itself can scale out. In other words, the computation (or the algorithm) should be easily decomposable into smaller units of work to be distributed among the workers based on the hardware/software configuration of the cluster or the cloud. Additionally, on such clusters, there is an important trade-off between communication cost, parallelism, and memory requirement. Due to the scalability need as well as this trade-off, it is crucial to have a well-decomposable, adaptive, tunable, and scalable program. Tunability enables the programmer to find an optimal point in the trade-off spectrum to execute the program efficiently on a specific cluster. We design and implement well-decomposable and tunable dynamic programming algorithms from the Gaussian Elimination Paradigm (GEP), such as Floyd-Warshall's all-pairs shortest path and Gaussian elimination without pivoting, for execution on Apache Spark. Our implementations are based on parametric multi-way recursive divide-conquer algorithms. We explain how to map implementations of those grid-based parallel algorithms to the Spark framework. Finally, we provide experimental results illustrating the performance, scalability, and portability of our Spark programs. We show that offloading the computation to an OpenMP environment (by running parallel recursive kernels) within Spark is at least partially responsible for a 2-5× speedup of the DP benchmarks.
KW - Apache Spark
KW - Communication-efficiency
KW - Distributed-memory Computing
KW - Dynamic Programming
KW - I/O-efficiency
KW - Polyhedral Compilation
KW - Recursive Divide-Conquer
UR - https://www.scopus.com/pages/publications/85096215742
U2 - 10.1109/CLUSTER49012.2020.00044
DO - 10.1109/CLUSTER49012.2020.00044
M3 - Conference contribution
AN - SCOPUS:85096215742
T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC
SP - 337
EP - 348
BT - Proceedings - 2020 IEEE International Conference on Cluster Computing, CLUSTER 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 22nd IEEE International Conference on Cluster Computing, CLUSTER 2020
Y2 - 14 September 2020 through 17 September 2020
ER -