Skip to main content

A MultiGPU Performance-Portable Solution for Array Programming Based on Kokkos

by Pedro Valero Lara, Jeffrey S Vetter
Publication Type
Conference Paper
Book Title
ARRAY 2023
Publication Date
Page Numbers
1 to 12
Publisher Location
New York, New York, United States of America
Conference Name
ARRAY 2023: 9th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming
Conference Location
Orlando, Florida, United States of America
Conference Sponsor
the 44th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2023)
Conference Date

Today, multiGPU nodes are widely used in high-performance computing and data centers. However, current programming models do not provide simple, transparent, and portable support for automatically targeting multiple GPUs within a node on application areas of array programming. In this paper, we describe a new application programming interface based on the Kokkos programming model to enable array computation on multiple GPUs in a transparent and portable way across both NVIDIA and AMD GPUs. We implement different variations of this technique to accommodate the exchange of stencils (array boundaries) among different GPU memory spaces, and we provide autotuning to select the proper number of GPUs, depending on the computational cost of the operations to be computed on arrays, that is completely transparent to the programmer. We evaluate our multiGPU extension on Summit (#5 TOP500), with six NVIDIA V100 Volta GPUs per node, and Crusher that contains identical hardware/software as Frontier (#1 TOP500), with four AMD MI250X GPUs, each with 2 Graphics Compute Dies (GCDs)for a total of 8 GCDs per node. We also compare the performance of this solution against the use of MPI + Kokkos, which is the cur-rent de facto solution for multiple GPUs in Kokkos. Our evaluation shows that the new Kokkos solution provides good scalability for many GPUs and a faster and simpler solution (from a programming productivity perspective) than MPI + Kokkos.