Abstract
Scientific workflows are becoming increasingly important in high performance computing (HPC) settings, as the feasibility and appeal of many simultaneous heterogeneous tasks increases with increasing hardware capabilities. Currently no HPC-based workflow platform supports a dynamically adaptable workflow with interactive steering and analysis at run-time. Furthermore, for most workflow programs, compute resources are fixed for a given instance, resulting in a possible waste of expensive allocation resources when tasks are spawned and killed. Here we describe the design and testing of a run-time-interactive, adaptable, steered workflow tool capable of executing thousands of parallel tasks without an MPI programming model, using a database management system to facilitate task management through multiple live connections. We find that on the Oak Ridge Leadership Computing Facility pre-exascale Summit supercomputer it is possible to launch and interactively steer workflows with thousands of simultaneous tasks with negligible latency. For the case of particle simulation and analysis tasks that run for minutes to hours, this paradigm offers the prospect of a robust and efficient means to perform simulation-space exploration with on-the-fly analysis and adaptation.