Skip to main navigation Skip to search Skip to main content

Distributed processing of very large datasets with DataCutter

  • University of Maryland, College Park
  • Johns Hopkins University

Research output: Contribution to journalArticlepeer-review

153 Scopus citations

Abstract

A DataCutter framework that is designed to provide support for subsetting and processing of datasets in a distributed and heterogeneous environment is presented. The use of DataCutter with several data-intensive applications from diverse fields was illustrated. The experimental results demonstrate the impact of heterogeneity on an application, and further suggest that any static application organization will likely not perform efficiently in all cases. The DataCutter filtering service uses techniques such as careful placement of filters, multiple filter group instances, and transparent copies to adjust dynamically to the heterogeneity present in the targeted runtime environment.

Original languageEnglish
Pages (from-to)1457-1478
Number of pages22
JournalParallel Computing
Volume27
Issue number11
DOIs
StatePublished - Oct 2001

Keywords

  • Component architectures
  • Data analysis
  • Distributed computing
  • Multi-dimensional datasets
  • Runtime systems

Fingerprint

Dive into the research topics of 'Distributed processing of very large datasets with DataCutter'. Together they form a unique fingerprint.

Cite this