Skip to main content
SHARE
Publication

Toward Improved Support for Loosely Coupled Large Scale Simulation Workflows...

by Swen Boehm, Wael R Elwasif, Thomas J Naughton Iii, Geoffroy R Vallee
Publication Type
Conference Paper
Publication Date
Conference Name
Annual Cray User Group
Conference Location
Lugano, Switzerland
Conference Date
-

High-performance computing (HPC) workloads are increasingly leveraging loosely coupled large scale simula- tions. Unfortunately, most large-scale HPC platforms, including Cray/ALPS environments, are designed for the execution of long-running jobs based on coarse-grained launch capabilities (e.g., one MPI rank per core on all allocated compute nodes). This assumption limits capability-class workload campaigns that require large numbers of discrete or loosely coupled simulations, and where time-to-solution is an untenable pacing issue. This paper describes the challenges related to the support of fine-grained launch capabilities that are necessary for the execution of loosely coupled large scale simulations on Cray/ALPS platforms. More precisely, we present the details of an enhanced runtime system to support this use case, and report on initial results from early testing on systems at Oak Ridge National Laboratory.