Abstract
Porting a complex scientific code, such as the E3SM land model (ELM), onto a new computing architecture is challenging. The paper presents design strategies and technical approaches to develop an ELM ecosystem dynamics model with compiler directives (OpenACC) on NVIDIA GPUs. The code has been refactored with advanced OpenACC features (such as deepcopy and routine directives) to reduce memory consumption and to increase the levels of parallelism through parallel loop reconstruction and new data structures. As a result, the optimized parallel implementation achieved more than a 140-time speedup (50 ms vs 7600 ms), compared to a naive implementation that uses OpenACC routine directive and parallelizes the code across existing loops on a single NVIDIA V100. On a fully loaded computing node with 44 CPUs and 6 GPUs, the code achieved over a 3.0-times speedup, compared to the original code on the CPU. Furthermore, the memory footprint of the optimized parallel implementation is 300 MB, which is around 15% of the 2.15 GB of memory consumed by a naive implementation. This study is the first effort to develop the ELM component on GPUs efficiently to support ultra-high-resolution land simulations at continental scales.