TY - GEN
T1 - A comparative survey of the HPC and big data paradigms
T2 - 2016 IEEE International Conference on Cluster Computing, CLUSTER 2016
AU - Asaadi, Hamid Reza
AU - Khaldi, Dounia
AU - Chapman, Barbara
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/12/6
Y1 - 2016/12/6
N2 - Many scientific data analytic applications need huge amounts of input, which can often consist of more than several TBs of data. This emphasizes the high I/O and processing/computational cost requirements of these algorithms. Tasks in these programs can induce more I/O operations than computations or the opposite. Hardware also includes nodes with large storage devices and/or nodes with sophisticated computational capabilities. To embrace the heterogeneity of the hardware systems in non-cloud and cloud environments, the issues of resource and job allocation in these environments need to be revisited. High- Performance Computing (HPC) models, under the leadership of MPI (plus OpenMP) parallel APIs, have mostly met users' requirements in terms of high computational performance, while Big Data frameworks such as Spark have performed likewise in terms of high-level programming, resiliency and I/O handling. Therefore, in order to meet the specialized needs of scientists, there is a need for convergence between HPC and Big Data ecosystems. This paper presents a data-supported, comparative survey of the main current HPC and Big Data programming interfaces, namely MPI, OpenMP, PGAS (OpenSHMEM), Spark, and Hadoop, and their software stacks. A comprehensive experimental study of these interfaces on a set of benchmarks, namely reduction and I/O microbenchmarks, the StackExchange AnswersCount benchmark, and PageRank Benchmark has been performed on a single platform in order to achieve a fair comparison. These experiments lead to a thorough discussion about whether the envisioned convergence is indeed needed or not, efficient or not, and in particular whether it is the best solution to tackle future computational challenges.
AB - Many scientific data analytic applications need huge amounts of input, which can often consist of more than several TBs of data. This emphasizes the high I/O and processing/computational cost requirements of these algorithms. Tasks in these programs can induce more I/O operations than computations or the opposite. Hardware also includes nodes with large storage devices and/or nodes with sophisticated computational capabilities. To embrace the heterogeneity of the hardware systems in non-cloud and cloud environments, the issues of resource and job allocation in these environments need to be revisited. High- Performance Computing (HPC) models, under the leadership of MPI (plus OpenMP) parallel APIs, have mostly met users' requirements in terms of high computational performance, while Big Data frameworks such as Spark have performed likewise in terms of high-level programming, resiliency and I/O handling. Therefore, in order to meet the specialized needs of scientists, there is a need for convergence between HPC and Big Data ecosystems. This paper presents a data-supported, comparative survey of the main current HPC and Big Data programming interfaces, namely MPI, OpenMP, PGAS (OpenSHMEM), Spark, and Hadoop, and their software stacks. A comprehensive experimental study of these interfaces on a set of benchmarks, namely reduction and I/O microbenchmarks, the StackExchange AnswersCount benchmark, and PageRank Benchmark has been performed on a single platform in order to achieve a fair comparison. These experiments lead to a thorough discussion about whether the envisioned convergence is indeed needed or not, efficient or not, and in particular whether it is the best solution to tackle future computational challenges.
UR - https://www.scopus.com/pages/publications/85013141051
U2 - 10.1109/CLUSTER.2016.21
DO - 10.1109/CLUSTER.2016.21
M3 - Conference contribution
AN - SCOPUS:85013141051
T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC
SP - 423
EP - 432
BT - Proceedings - 2016 IEEE International Conference on Cluster Computing, CLUSTER 2016
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 13 September 2016 through 15 September 2016
ER -