Skip to main navigation Skip to search Skip to main content

A comparative survey of the HPC and big data paradigms: Analysis and experiments

  • Stony Brook University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

30 Scopus citations

Abstract

Many scientific data analytic applications need huge amounts of input, which can often consist of more than several TBs of data. This emphasizes the high I/O and processing/computational cost requirements of these algorithms. Tasks in these programs can induce more I/O operations than computations or the opposite. Hardware also includes nodes with large storage devices and/or nodes with sophisticated computational capabilities. To embrace the heterogeneity of the hardware systems in non-cloud and cloud environments, the issues of resource and job allocation in these environments need to be revisited. High- Performance Computing (HPC) models, under the leadership of MPI (plus OpenMP) parallel APIs, have mostly met users' requirements in terms of high computational performance, while Big Data frameworks such as Spark have performed likewise in terms of high-level programming, resiliency and I/O handling. Therefore, in order to meet the specialized needs of scientists, there is a need for convergence between HPC and Big Data ecosystems. This paper presents a data-supported, comparative survey of the main current HPC and Big Data programming interfaces, namely MPI, OpenMP, PGAS (OpenSHMEM), Spark, and Hadoop, and their software stacks. A comprehensive experimental study of these interfaces on a set of benchmarks, namely reduction and I/O microbenchmarks, the StackExchange AnswersCount benchmark, and PageRank Benchmark has been performed on a single platform in order to achieve a fair comparison. These experiments lead to a thorough discussion about whether the envisioned convergence is indeed needed or not, efficient or not, and in particular whether it is the best solution to tackle future computational challenges.

Original languageEnglish
Title of host publicationProceedings - 2016 IEEE International Conference on Cluster Computing, CLUSTER 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages423-432
Number of pages10
ISBN (Electronic)9781509036530
DOIs
StatePublished - Dec 6 2016
Event2016 IEEE International Conference on Cluster Computing, CLUSTER 2016 - Taipei, Taiwan, Province of China
Duration: Sep 13 2016Sep 15 2016

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
ISSN (Print)1552-5244

Conference

Conference2016 IEEE International Conference on Cluster Computing, CLUSTER 2016
Country/TerritoryTaiwan, Province of China
CityTaipei
Period09/13/1609/15/16

Fingerprint

Dive into the research topics of 'A comparative survey of the HPC and big data paradigms: Analysis and experiments'. Together they form a unique fingerprint.

Cite this