Skip to main navigation Skip to search Skip to main content

Processing large-scale multi-dimensional data in parallel and distributed environments

  • Michael Beynon
  • , Chialin Chang
  • , Umit Catalyurek
  • , Tahsin Kurc
  • , Alan Sussman
  • , Henrique Andrade
  • , Renato Ferreira
  • , Joel Saltz
  • University of Maryland, College Park
  • Ohio State University

Research output: Contribution to journalArticlepeer-review

48 Scopus citations

Abstract

Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability to analyze information is increasingly becoming hindered by dataset sizes. The vast amount of data in scientific datasets makes it a difficult task to efficiently access the data of interest, and manage potentially heterogeneous system resources to process the data. Subsetting and aggregation are common operations executed in a wide range of data-intensive applications. We argue that common runtime and programming support can be developed for applications that query and manipulate large datasets. This paper presents a compendium of frameworks and methods we have developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments.

Original languageEnglish
Pages (from-to)827-859
Number of pages33
JournalParallel Computing
Volume28
Issue number5
DOIs
StatePublished - May 2002

Keywords

  • Data-intensive applications
  • Distributed computing
  • Multi-dimensional datasets
  • Parallel processing
  • Runtime systems

Fingerprint

Dive into the research topics of 'Processing large-scale multi-dimensional data in parallel and distributed environments'. Together they form a unique fingerprint.

Cite this