Skip to main navigation Skip to search Skip to main content

Clover: Compiler directed lightweight soft error resilience

  • Virginia Polytechnic Institute and State University
  • Oak Ridge National Laboratory

Research output: Contribution to journalArticlepeer-review

24 Scopus citations

Abstract

This paper presents Clover, a compiler directed soft error detection and recovery scheme for lightweight soft error resilience. The compiler carefully generates soft error tolerant code based on idempotent processing without explicit checkpoint. During program execution, Clover relies on a small number of acoustic wave detectors deployed in the processor to identify soft errors by sensing the wave made by a particle strike. To cope with DUE (detected unrecoverable errors) caused by the sensing latency of error detection, Clover leverages a novel selective instruction duplication technique called tail-DMR (dual modular redundancy). Once a soft error is detected by either the sensor or the tail-DMR, Clover takes care of the error as in the case of exception handling. To recover from the error, Clover simply redirects program control to the beginning of the code region where the error is detected. The experiment results demonstrate that the average runtime overhead is only 26%, which is a 75% reduction compared to that of the state-of-the-art soft error resilience technique.

Original languageEnglish
Article numbera2
Pages (from-to)11-20
Number of pages10
JournalACM SIGPLAN Notices
Volume50
Issue number5
DOIs
StatePublished - May 2015

Keywords

  • Acoustic wave detectors
  • Compilers
  • Idempotent processing
  • Soft error resilience
  • Tail-DMR frontier

Fingerprint

Dive into the research topics of 'Clover: Compiler directed lightweight soft error resilience'. Together they form a unique fingerprint.

Cite this