Skip to main navigation Skip to search Skip to main content

Adaptively Accelerating Map-Reduce/Spark with GPUs: A Case Study

  • IBM
  • Carnegie Mellon University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

In this paper, we propose and evaluate a simple mechanism to accelerate iterative machine learning algorithms implemented in Hadoop map-reduce (stock), and Apache Spark. In particular, we describe a technique that enables data parallel tasks in map-reduce and Spark to be dynamically and adaptively scheduled on CPU or GPU, based on availability and load. We examine the extent of performance improvements, and correlate them to various parameters of the algorithms studied. We focus on end-to-end performance impact, including overheads associated with transferring data into and out of the GPU, and conversion between data representations in the JVM and on GPU. We also present three optimizations that, in our analysis, can be generalized across many iterative machine learning applications. We present a case study where we accelerate four iterative machine learning applications-multinomial logistic regression, multiple linear regression, K-Means clustering and principal components analysis using singular value decomposition, implemented in three data analytics frameworks-Hadoop Map-Reduce (HMR), IBM Main-Memory Map-Reduce (M3R) and Spark. We observe that the use of GPGPUs decreases the execution time of these applications on HMR by up to 8X, M3R by up to 18X, and Spark by up to 25X. Through our empirical analysis, we offer several insights that can be helpful in designing middleware and cluster managers to accelerate map-reduce and Spark applications using GPUs.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE International Conference on Autonomic Computing, ICAC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages105-114
Number of pages10
ISBN (Electronic)9781728124117
DOIs
StatePublished - Jun 2019
Event16th IEEE International Conference on Autonomic Computing, ICAC 2019 - Umea, Sweden
Duration: Jun 16 2019Jun 20 2019

Publication series

NameProceedings - 2019 IEEE International Conference on Autonomic Computing, ICAC 2019

Conference

Conference16th IEEE International Conference on Autonomic Computing, ICAC 2019
Country/TerritorySweden
CityUmea
Period06/16/1906/20/19

Keywords

  • Acceleration
  • data analytics
  • GPU
  • Hadoop
  • Map Reduce
  • Spark

Fingerprint

Dive into the research topics of 'Adaptively Accelerating Map-Reduce/Spark with GPUs: A Case Study'. Together they form a unique fingerprint.

Cite this