Strategies for Integrating Deep Learning Surrogate Models with HPC Simulation Applications

by Junqi Yin, Feiyi Wang, Mallikarjun Shankar

Publication Type

Conference Paper

Book Title

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Publication Date

May, 2022

Page Numbers

1256 to 1265

Publisher Location

New Jersey, United States of America

Conference Name

ExSAIS 2022: Workshop on Extreme Scaling of AI for Science, co-Located with IPDPS 2022

Conference Location

Lyons, France

Conference Sponsor

IEEE

Conference Date

May 30, 2022 - Jun 3, 2022

View DOI Listing

Abstract

The emerging trend of the convergence of high performance computing (HPC), machine learning/deep learning (ML/DL), and big data analytics presents a host of challenges for large-scale computing campaigns that seek best practices to interleave traditional scientific simulation-based workloads with ML/DL models. A portfolio of systematic approaches to incorporate deep learning into modeling and simulation serves a vital need when we support AI for science at a computing facility. In this paper, we evaluate several strategies for deploying deep learning surrogate models in a representative physics application on supercomputers at the Oak Ridge Leadership Computing Facility (OLCF). We discuss a set of recommended deployment architectures and implementation approaches. We analyze and evaluate these alternatives and show their performance and scalability up to 1000 GPUs on two mainstream platforms equipped with different deep learning hardware and software stacks.

Strategies for Integrating Deep Learning Surrogate Models with HPC Simulation Applications

Abstract

Researchers

Organizations