Skip to main navigation Skip to search Skip to main content

SHF: AF: Small: Algorithms and a Code Generator for Faster Stencil Computations

Project: Research

Project Details

Description

A wide range of application areas across industry and scientific computing use stencil computations, including the simulation of mechanical systems, traffic flows, meteorology, stochastic and fractional differential equations, chemistry, modeling of erosion, fluid dynamics, quantitative finance, physical simulations, and even cellular automata. A stencil defines the value of a grid cell in a spatial grid at any time step t by combining the values of a fixed constant-sized set of neighboring cells at recent time steps earlier than t. Cells near the boundary of a bounded grid are defined by additional conditions, since they lack the neighboring cells necessary for the stencil to use. A stencil computation is one in which a grid filled with initial data is evolved by repeatedly applying a stencil on the grid cells for a given number of time steps. Stencil computations are easy to implement using nested loops and such implementations perform work proportional to N*T for updating a grid of size N for T time steps. Running on multiple processing cores of a multicore machine and/or using cache-efficient techniques to reduce costly accesses to the slow computer memory (random access memory, RAM) during the computation can result in much smaller running times. But the work (a.k.a. computational complexity) which represents the total resource usage (e.g., total energy consumed or the total processing time across all processing cores) still remains proportional to N*T. In contrast, building on the recent prior work by the investigator's research group, this project will improve over many of these state-of-the-art stencil algorithms not only by performing a significantly less amount of work W(N,T) for any given N and T, but also guaranteeing a more desirable property -- the ratio W(N,T)/(N*T) decreases with the increase of N*T. The resulting algorithms will not only run faster than existing algorithms on modern multicore computers, they will significantly reduce the overall resource (e.g., time, energy) usage of many classes of stencil computations. A code generator that can automatically produce highly efficient correct parallel implementations of the new algorithms from simple specifications of stencil computations will be implemented. An autotuner for further automatic performance optimization of those implementations on any target multicore machine will also be built. Thus, this project will bring high-performing resource-saving stencil computations to a wider audience of scientists, students, and educators, without requiring them to fully understand the complicated algorithms or invest time and effort necessary to implement such algorithms on modern computing systems. Findings and research problems from this project will be integrated in the advanced graduate courses. Undergraduate and graduate students will be involved in the research project. The investigator's research group has recently shown polynomial improvements in computational complexity over state-of-the-art stencil algorithms for linear stencils with arbitrary aperiodic boundary conditions using the Fast Fourier Transform (FFT). Stencils that define grid values for the current time step as a linear (resp. nonlinear) function of grid values from earlier time steps are called linear (resp. nonlinear) stencils. Very recently the group has also shown that linear stencil computations under the freespace boundary condition can be well-approximated very efficiently in work and space through a novel use of Gaussian approximations and n-body computations. Implementations of most of these new algorithms run significantly faster than the fastest implementations of existing algorithms. This project will build on those results and explore other novel ideas to tackle the following algorithm design challenges: (1) fast approximation algorithms for linear periodic/aperiodic stencil computations, (2) faster exact algorithms for linear stencils under special boundary conditions (e.g., Dirichlet, Neumann), (3) fast algorithms for classes of nonlinear stencils, and (4) fast algorithms for heterogeneous stencil grids. Designing algorithms with improved computational complexity will be the core challenge and objective across all classes of stencils. Further improvements will come from reducing the span and cache (or input/output, I/O) complexity. If there cannot exist algorithms with improved bounds (work, span, or I/O) for certain classes, the aim will be to prove asymptotically tight lower bounds. In many time-sensitive simulations (e.g., weather forecasting) fast algorithms with approximately correct results are more desirable than slow algorithms with exact results. Hence, fast approximation algorithms will be designed for potential use in such cases. A specialized software stack named "Saltar" will be built to generate efficient multicore implementations of the algorithms to be designed in this project. The Saltar domain-specific language (DSL) will allow users to write down simple specifications of the stencil problems they would like to solve, and the Saltar code generator will take the specification to generate highly optimized multithreaded OpenCilk/OpenMP code. The Saltar autotuner will be built for automatic offline tuning of the parallel recursive divide-and-conquer implementations generated by the code generator. Saltar will be released under an open-source license (BSD 3-clause or similar permissive licensing allowing use by industry). A public web interface will be built to evaluate the success of the project and the usability of Saltar, and a benchmark suite will be created by collecting stencil problems from various sources including potentially from the web users (with permission). This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date07/3/2309/30/26

Funding

  • National Science Foundation: $592,987.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.