Skip to main navigation Skip to search Skip to main content

Using space and attribute partitioned partial replicas for data subsetting and aggregation queries

  • Ohio State University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Partial replication is one type of optimization to speed up execution of queries submitted to large datasets. In partial replication, a portion of the dataset is extracted, re-organized, and re-distributed across the storage system. In this paper we investigate methods for efficient execution of queries when replicas of a dataset exist; we assume the replicas have already been created and do not target the replica creation problem. We propose a cost model and algorithm for combined use of space partitioned and attribute partitioned replicas for executing data subsetting range queries. We extend the cost model and propose a greedy algorithm to address range queries with aggregation operations. The extended replica selection algorithm allows uneven partitioning of replicas across storage nodes. Different replicas can be partitioned across different subsets of storage nodes. We have implemented these techniques as part of an automatic data virtualization system and have evaluated the benefits of our techniques using this system. We demonstrate the efficacy of the algorithms on parallel machines using queries on datasets from oil reservoir simulation studies and satellite data processing applications.

Original languageEnglish
Title of host publicationICPP 2006
Subtitle of host publicationProceedings of the 2006 International Conference on Parallel Processing
Pages271-278
Number of pages8
DOIs
StatePublished - 2006
EventICPP 2006: 2006 International Conference on Parallel Processing - Columbus, OH, United States
Duration: Aug 14 2006Aug 18 2006

Publication series

NameProceedings of the International Conference on Parallel Processing
ISSN (Print)0190-3918

Conference

ConferenceICPP 2006: 2006 International Conference on Parallel Processing
Country/TerritoryUnited States
CityColumbus, OH
Period08/14/0608/18/06

Fingerprint

Dive into the research topics of 'Using space and attribute partitioned partial replicas for data subsetting and aggregation queries'. Together they form a unique fingerprint.

Cite this