TY - GEN
T1 - Adaptively Accelerating Map-Reduce/Spark with GPUs
T2 - 16th IEEE International Conference on Autonomic Computing, ICAC 2019
AU - Jayaram, K. R.
AU - Gandhi, Anshul
AU - Xin, Hongyi
AU - Tao, Shu
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/6
Y1 - 2019/6
N2 - In this paper, we propose and evaluate a simple mechanism to accelerate iterative machine learning algorithms implemented in Hadoop map-reduce (stock), and Apache Spark. In particular, we describe a technique that enables data parallel tasks in map-reduce and Spark to be dynamically and adaptively scheduled on CPU or GPU, based on availability and load. We examine the extent of performance improvements, and correlate them to various parameters of the algorithms studied. We focus on end-to-end performance impact, including overheads associated with transferring data into and out of the GPU, and conversion between data representations in the JVM and on GPU. We also present three optimizations that, in our analysis, can be generalized across many iterative machine learning applications. We present a case study where we accelerate four iterative machine learning applications-multinomial logistic regression, multiple linear regression, K-Means clustering and principal components analysis using singular value decomposition, implemented in three data analytics frameworks-Hadoop Map-Reduce (HMR), IBM Main-Memory Map-Reduce (M3R) and Spark. We observe that the use of GPGPUs decreases the execution time of these applications on HMR by up to 8X, M3R by up to 18X, and Spark by up to 25X. Through our empirical analysis, we offer several insights that can be helpful in designing middleware and cluster managers to accelerate map-reduce and Spark applications using GPUs.
AB - In this paper, we propose and evaluate a simple mechanism to accelerate iterative machine learning algorithms implemented in Hadoop map-reduce (stock), and Apache Spark. In particular, we describe a technique that enables data parallel tasks in map-reduce and Spark to be dynamically and adaptively scheduled on CPU or GPU, based on availability and load. We examine the extent of performance improvements, and correlate them to various parameters of the algorithms studied. We focus on end-to-end performance impact, including overheads associated with transferring data into and out of the GPU, and conversion between data representations in the JVM and on GPU. We also present three optimizations that, in our analysis, can be generalized across many iterative machine learning applications. We present a case study where we accelerate four iterative machine learning applications-multinomial logistic regression, multiple linear regression, K-Means clustering and principal components analysis using singular value decomposition, implemented in three data analytics frameworks-Hadoop Map-Reduce (HMR), IBM Main-Memory Map-Reduce (M3R) and Spark. We observe that the use of GPGPUs decreases the execution time of these applications on HMR by up to 8X, M3R by up to 18X, and Spark by up to 25X. Through our empirical analysis, we offer several insights that can be helpful in designing middleware and cluster managers to accelerate map-reduce and Spark applications using GPUs.
KW - Acceleration
KW - data analytics
KW - GPU
KW - Hadoop
KW - Map Reduce
KW - Spark
UR - https://www.scopus.com/pages/publications/85073202645
U2 - 10.1109/ICAC.2019.00022
DO - 10.1109/ICAC.2019.00022
M3 - Conference contribution
AN - SCOPUS:85073202645
T3 - Proceedings - 2019 IEEE International Conference on Autonomic Computing, ICAC 2019
SP - 105
EP - 114
BT - Proceedings - 2019 IEEE International Conference on Autonomic Computing, ICAC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 16 June 2019 through 20 June 2019
ER -