PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications

Show authors

Publication Type

Conference Paper

Book Title

ICS '23: Proceedings of the 37th International Conference on Supercomputing

Publication Date

June, 2023

Page Numbers

167 to 179

Issue

Publisher Location

New York, New York, United States of America

Conference Name

International Conference on Supercomputing (ICS)

Conference Location

Orlando, Florida, United States of America

Conference Sponsor

ACM

Conference Date

Jun 21, 2023 - Jun 23, 2023

View DOI Listing

Abstract

Iterative memory-bound solvers commonly occur in HPC codes. Typical GPU implementations have a loop on the host side that invokes the GPU kernel as much as time/algorithm steps there are. The termination of each kernel implicitly acts the barrier required after advancing the solution every time step. We propose an execution model for running memory-bound iterative GPU kernels: PERsistent KernelS (PERKS). In this model, the time loop is moved inside persistent kernel, and device-wide barriers are used for synchronization. We then reduce the traffic to device memory by caching subset of the output in each time step in the unused registers and shared memory. PERKS can be generalized to any iterative solver: they largely independent of the solver's implementation. We explain the design principle of PERKS and demonstrate effectiveness of PERKS for a wide range of iterative 2D/3D stencil benchmarks (geomean speedup of 2.12x for 2D stencils and 1.24x for 3D stencils over state-of-art libraries), and a Krylov subspace conjugate gradient solver (geomean speedup of 4.86x in smaller SpMV datasets from SuiteSparse and 1.43x in larger SpMV datasets over a state-of-art library). All PERKS-based implementations available at: https://github.com/neozhang307/PERKS.

PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications

Abstract

Researchers

Organizations