A MultiGPU Performance-Portable Solution for Array Programming Based on Kokkos

by Pedro Valero Lara, Jeffrey S Vetter

Publication Type

Conference Paper

Book Title

ARRAY 2023

Publication Date

June, 2023

Page Numbers

1 to 12

Publisher Location

New York, New York, United States of America

Conference Name

ARRAY 2023: 9th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming

Conference Location

Orlando, Florida, United States of America

Conference Sponsor

the 44th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2023)

Conference Date

Jun 17, 2023 - Jun 21, 2023

View DOI Listing

Abstract

Today, multiGPU nodes are widely used in high-performance computing and data centers. However, current programming models do not provide simple, transparent, and portable support for automatically targeting multiple GPUs within a node on application areas of array programming. In this paper, we describe a new application programming interface based on the Kokkos programming model to enable array computation on multiple GPUs in a transparent and portable way across both NVIDIA and AMD GPUs. We implement different variations of this technique to accommodate the exchange of stencils (array boundaries) among different GPU memory spaces, and we provide autotuning to select the proper number of GPUs, depending on the computational cost of the operations to be computed on arrays, that is completely transparent to the programmer. We evaluate our multiGPU extension on Summit (#5 TOP500), with six NVIDIA V100 Volta GPUs per node, and Crusher that contains identical hardware/software as Frontier (#1 TOP500), with four AMD MI250X GPUs, each with 2 Graphics Compute Dies (GCDs)for a total of 8 GCDs per node. We also compare the performance of this solution against the use of MPI + Kokkos, which is the cur-rent de facto solution for multiple GPUs in Kokkos. Our evaluation shows that the new Kokkos solution provides good scalability for many GPUs and a faster and simpler solution (from a programming productivity perspective) than MPI + Kokkos.

A MultiGPU Performance-Portable Solution for Array Programming Based on Kokkos

Abstract

Researchers

Organizations