Skip to main navigation Skip to search Skip to main content

Improving parallelism of recursive stencil computations without sacrificing cache performance

  • Fudan University
  • Stony Brook University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

The state-of-the-art "trapezoidal decomposition algorithm" for stencil computations on modern multicore machines use recursive divide-and-conquer (DAC) to achieve asymptotically optimal cache complexity cache-obliviously. But the same DAC approach restricts parallelism by introducing artificial dependencies among subtasks in addition to those arising from the defining stencil equations. As a result, the trapezoidal decomposition algorithm has suboptimal parallelism. In this paper we present a variant of the parallel trapezoidal decomposition algorithm called "cache-oblivious wavefront" (COW) that starts execution of recursive subtasks earlier than the start time prescribed by the original algorithm without violating any real dependencies implied by the underlying recurrences, and thus reducing serialization due to artificial dependencies. The reduction in serialization leads to an improvement in parallelism. Moreover, since we do not change the DAC-based decomposition of tasks used in the original algorithm, cache performance does not suffer. We provide experimental measurements of absolute running times, burdened span by Cilkview, and L1/L2 cache misses by PAPI to validate our claims.

Original languageEnglish
Title of host publicationWOSC 2014 - Proceedings of the 2014 ACM SIGPLAN Workshop on Stencil Computations, Part of SPLASH 2014
PublisherAssociation for Computing Machinery, Inc
Pages1-7
Number of pages7
ISBN (Electronic)9781450323086
DOIs
StatePublished - Oct 20 2014
Event2014 ACM SIGPLAN 2nd Workshop on Stencil Computations, WOSC 2014 - Portland, United States
Duration: Oct 20 2014Oct 20 2014

Publication series

NameWOSC 2014 - Proceedings of the 2014 ACM SIGPLAN Workshop on Stencil Computations, Part of SPLASH 2014

Conference

Conference2014 ACM SIGPLAN 2nd Workshop on Stencil Computations, WOSC 2014
Country/TerritoryUnited States
CityPortland
Period10/20/1410/20/14

Keywords

  • Atomic operation
  • Cacheoblivious wave front
  • Cilk
  • Multi-core
  • Nested parallel computation
  • Parallel cache-oblivious algorithm
  • Stencil

Fingerprint

Dive into the research topics of 'Improving parallelism of recursive stencil computations without sacrificing cache performance'. Together they form a unique fingerprint.

Cite this