Abstract
Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability to analyze information is increasingly becoming hindered by dataset sizes. The vast amount of data in scientific datasets makes it a difficult task to efficiently access the data of interest, and manage potentially heterogeneous system resources to process the data. Subsetting and aggregation are common operations executed in a wide range of data-intensive applications. We argue that common runtime and programming support can be developed for applications that query and manipulate large datasets. This paper presents a compendium of frameworks and methods we have developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments.
| Original language | English |
|---|---|
| Pages (from-to) | 827-859 |
| Number of pages | 33 |
| Journal | Parallel Computing |
| Volume | 28 |
| Issue number | 5 |
| DOIs | |
| State | Published - May 2002 |
Keywords
- Data-intensive applications
- Distributed computing
- Multi-dimensional datasets
- Parallel processing
- Runtime systems
Fingerprint
Dive into the research topics of 'Processing large-scale multi-dimensional data in parallel and distributed environments'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver