Skip to main content
SHARE
Publication

ExaCA: A performance portable exascale cellular automata application for alloy solidification modeling

Publication Type
Journal
Journal Name
Computational Materials Science
Publication Date
Page Number
111692
Volume
214
Issue
1

Modeling the as-solidified grain structures that form during alloy processing is a critical component in understanding process-property relationships, particularly for additive manufacturing (AM) where grain structure is very sensitive to processing conditions. While cellular automata (CA)-based models have proven able to predict aspects of microstructure for several alloys and AM process conditions, long run times and large resource sets required limit the utility and the problem size to which existing CA models can be applied. As part of the ExaAM project, an initiative within the Exascale Computing Project (ECP) to develop, test, and optimize an exascale-capable coupled and self-consistent model of AM parts, we developed ExaCA (https://github.com/LLNL/ExaCA) for the liquid–solid phase transformation in the wake of AM melt pools. The CA-based code is parallelized using MPI and the Kokkos programming model, the latter enabling simulation on both CPUs and GPUs within a single-source implementation. We detail the steps taken to transform a baseline, MPI-based CA code into one that is performant on CPUs and GPUs. Performance testing of ExaCA on Summit (a pre-exascale machine at Oak Ridge National Laboratory) was used to quantify CPU–GPU speedup comparing with equal numbers of nodes. Testing showed comparable CPU performance to the MPI-only CA code and a 5-20x speedup when running AM-based test problems using GPUs. The improved performance of CA through GPU utilization and the performance portable nature of ExaCA will enable accurate part-scale modeling by harnessing the power of current and future generations of high performance computing resources. Future work will include improving the strong scaling of ExaCA on GPUs by reducing load imbalance associated with the locality of the problem, and continuing performance optimization across exascale hardware.