Skip to main navigation Skip to search Skip to main content

Exploratory Visual Analysis of Anomalous Runtime Behavior in Streaming High Performance Computing Applications

  • Cong Xie
  • , Wonyong Jeong
  • , Gyorgy Matyasfalvi
  • , Hubertus Van Dam
  • , Klaus Mueller
  • , Shinjae Yoo
  • , Wei Xu
  • Stony Brook University
  • Brookhaven National Laboratory

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Online analysis of runtime behavior is essential for performance tuning in streaming scientific workflows. Integration of anomaly detection and visualization is necessary to support human-centered analysis, such as verification of candidate anomalies utilizing domain knowledge. In this work, we propose an efficient and scalable visual analytics system for online performance analysis of scientific workflows toward the exascale scenario. Our approach uses a call stack tree representation to encode the structural and temporal information of the function executions. Based on the call stack tree features (e.g., execution time of the root function or vector representation of the tree structure), we employ online anomaly detection approaches to identify candidate anomalous function executions. We also present a set of visualization tools for verification and exploration in a level-of-detailed manner. General information, such as distribution of execution times, are provided in an overview visualization. The detailed structure (e.g., function invocation relations) and the temporal information (e.g., message communication) of the execution call stack of interest are also visualized. The usability and efficiency of our methods are verified in a real-world HPC application.

Original languageEnglish
Title of host publicationComputational Science – ICCS 2019 - 19th International Conference, Proceedings
EditorsJoão M.F. Rodrigues, Pedro J.S. Cardoso, Jânio Monteiro, Roberto Lam, Valeria V. Krzhizhanovskaya, Michael H. Lees, Peter M.A. Sloot, Jack J. Dongarra
PublisherSpringer Verlag
Pages153-167
Number of pages15
ISBN (Print)9783030227333
DOIs
StatePublished - 2019
Event19th International Conference on Computational Science, ICCS 2019 - Faro, Portugal
Duration: Jun 12 2019Jun 14 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11536 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference19th International Conference on Computational Science, ICCS 2019
Country/TerritoryPortugal
CityFaro
Period06/12/1906/14/19

Keywords

  • Anomaly detection
  • High Performance Computing
  • Streaming analysis
  • Trace events
  • Visual analytics

Fingerprint

Dive into the research topics of 'Exploratory Visual Analysis of Anomalous Runtime Behavior in Streaming High Performance Computing Applications'. Together they form a unique fingerprint.

Cite this