Toward designing effective exascale scientific computing workflows: experiences and best practices

by Mark A Coletti, Russell B Davidson, Ada A Sedova

Publication Type

Conference Paper

Book Title

PROCEEDINGS OF THE 2022 IMPROVING SCIENTIFIC SOFTWARE CONFERENCE

Publication Date

October, 2022

Page Numbers

1 to 12

Publisher Location

Boulder, Colorado, United States of America

Conference Name

SEA's Improving Scientific Software Conference (SEA 2022)

Conference Location

Boulder, Colorado, United States of America

Conference Sponsor

UCAR and NCAR

Conference Date

Apr 4, 2022 - Apr 8, 2022

Abstract

Many fields within scientific computing have embraced advances in big-data analysis and machine learning, which often requires the deployment of large, distributed and complicated workflows that may combine training neural networks, performing simulations, running inference, and performing database queries and data analysis in asynchronous, parallel and pipelined execution frameworks. Such a shift has brought into focus the need for scalable, efficient workflow management solutions with reproducibility, error and provenance handling, traceability, and checkpoint-restart capabilities, among other needs. Here, we discuss challenges and best-practices for deploying exascale-generation computational science workflows on resources at the Oak Ridge Leadership Computing Facility (OLCF). We present our experiences with large-scale deployment of distributed workflows on the Summit supercomputer, including for bioinformatics and computational biophysics, materials science, and deep learning model optimization. We also present problems and solutions created by working within a Python-centric software base on traditional HPC systems, and discuss steps that will be required before the convergence of HPC, AI, and data science can be fully realized. Our results point to a wealth of exciting new possibilities for harnessing this convergence to tackle new scientific challenges.

Toward designing effective exascale scientific computing workflows: experiences and best practices

Abstract

Researchers

Organizations