Developing Ultrahigh-Resolution E3SM Land Model for GPU Systems

by Peter D Schwartz, Dali Wang, Fengming Yuan, Peter E Thornton

Publication Type

Conference Paper

Book Title

Computational Science and Its Applications – ICCSA 2023

Publication Date

July, 2023

Page Numbers

277 to 290

Volume

13956

Issue

Publisher Location

Cham, Switzerland

Conference Name

International Conference on Computational Science and Its Applications (ICCSA)

Conference Location

Athens, Greece

Conference Sponsor

Various

Conference Date

Jul 3, 2023 - Jul 6, 2023

View DOI Listing

Abstract

Designing and refactoring complex scientific code, such as the E3SM land model (ELM), for new computing architectures is challenging. This paper presents design strategies and technical approaches to develop a data-oriented, GPU-ready ELM model using compiler directives (OpenACC/OpenMP). We first analyze the datatypes and processes in the original ELM code. Then we present design considerations for ultrahigh-resolution ELM (uELM) development for massive GPU systems. These techniques include the global data-oriented simulation workflow, domain partition, code porting and data copy, memory reduction, parallel loop restructure and flattening, and race condition detection. We implemented the first version of uELM using OpenACC targeting the NVidia GPUs in the Summit supercomputer at Oak Ridge National Laboratory. During the implementation, we developed a software tool (named SPEL) to facilitate code generation, verification, and performance tuning using these techniques. The first uELM implementation for Nvidia GPUs on Summit delivered promising results: 1) over 98% of the ELM code was automatically generated and tuned by scripts. Most ELM modules had better computational performances than the original ELM code for CPUs. The GPU-ready uELM is more scalable than the CPU code on fully-loaded Summit nodes. Example profiling results from several modules are also presented to illustrate the performance improvements and race condition detection. The lessons learned and toolkit developed in the study are also suitable for further uELM deployment using OpenMP on the first US exascale computer, Frontier, equipped with AMD CPUs and GPUs.

Developing Ultrahigh-Resolution E3SM Land Model for GPU Systems

Abstract

Researchers

Organizations