Skip to main navigation Skip to search Skip to main content

Task scheduling and file replication for data-intensive jobs with batch-shared I/O

  • Gaurav Khanna
  • , Nagavijayalakshmi Vydyanathan
  • , Umit Catalyurek
  • , Tahsin Kurc
  • , Sriram Krishnamoorthyt
  • , P. Sadayappan
  • , Joel Saltz
  • Dept. of Computer Science and Engineering
  • Ohio State University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

This paper addresses the problem of efficient execution of a batch of data-intensive tasks with batch-shared I/O behavior, on coupled storage and compute clusters. Two scheduling schemes are proposed: 1) a 0-1 Integer Programming (IP) based approach, which couples task scheduling and data replication, and 2) a bi-level hypergraph partitioning based heuristic approach (BiPartition), which decouples task scheduling and data replication. The experimental results show that: 1) the IP scheme achieves the best batch execution time, but has significant scheduling overhead, thereby restricting its application to small scale workloads, and 2) the BiPartition scheme is a better fit for larger workloads and systems - it has very low scheduling overhead and no more than 5-10% degradation in solution quality, when compared with the IP based approach.

Original languageEnglish
Title of host publicationProceedings of the 15th IEEE International Symposium on High Performance Distributed Computing, HPDC-15
Pages241-252
Number of pages12
StatePublished - 2006
Event15th IEEE International Symposium on High Performance Distributed Computing, HPDC-15 - Paris, France
Duration: Jun 19 2006Jun 23 2006

Publication series

NameProceedings of the IEEE International Symposium on High Performance Distributed Computing
Volume2006
ISSN (Print)1082-8907

Conference

Conference15th IEEE International Symposium on High Performance Distributed Computing, HPDC-15
Country/TerritoryFrance
CityParis
Period06/19/0606/23/06

Fingerprint

Dive into the research topics of 'Task scheduling and file replication for data-intensive jobs with batch-shared I/O'. Together they form a unique fingerprint.

Cite this