Skip to main navigation Skip to search Skip to main content

Efficient Execution of Dynamic Programming Algorithms on Apache Spark

  • Stony Brook University
  • SUNY Buffalo
  • Colorado State University System

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

One of the most important properties of distributed computing systems (e.g., Apache Spark, Apache Hadoop, etc) on clusters and computation clouds is the ability to scale out by adding more compute nodes to the cluster. This important feature can lead to performance gain provided the computation (or the algorithm) itself can scale out. In other words, the computation (or the algorithm) should be easily decomposable into smaller units of work to be distributed among the workers based on the hardware/software configuration of the cluster or the cloud. Additionally, on such clusters, there is an important trade-off between communication cost, parallelism, and memory requirement. Due to the scalability need as well as this trade-off, it is crucial to have a well-decomposable, adaptive, tunable, and scalable program. Tunability enables the programmer to find an optimal point in the trade-off spectrum to execute the program efficiently on a specific cluster. We design and implement well-decomposable and tunable dynamic programming algorithms from the Gaussian Elimination Paradigm (GEP), such as Floyd-Warshall's all-pairs shortest path and Gaussian elimination without pivoting, for execution on Apache Spark. Our implementations are based on parametric multi-way recursive divide-conquer algorithms. We explain how to map implementations of those grid-based parallel algorithms to the Spark framework. Finally, we provide experimental results illustrating the performance, scalability, and portability of our Spark programs. We show that offloading the computation to an OpenMP environment (by running parallel recursive kernels) within Spark is at least partially responsible for a 2-5× speedup of the DP benchmarks.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE International Conference on Cluster Computing, CLUSTER 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages337-348
Number of pages12
ISBN (Electronic)9781728166773
DOIs
StatePublished - Sep 2020
Event22nd IEEE International Conference on Cluster Computing, CLUSTER 2020 - Kobe, Japan
Duration: Sep 14 2020Sep 17 2020

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
Volume2020-September
ISSN (Print)1552-5244

Conference

Conference22nd IEEE International Conference on Cluster Computing, CLUSTER 2020
Country/TerritoryJapan
CityKobe
Period09/14/2009/17/20

Keywords

  • Apache Spark
  • Communication-efficiency
  • Distributed-memory Computing
  • Dynamic Programming
  • I/O-efficiency
  • Polyhedral Compilation
  • Recursive Divide-Conquer

Fingerprint

Dive into the research topics of 'Efficient Execution of Dynamic Programming Algorithms on Apache Spark'. Together they form a unique fingerprint.

Cite this