Skip to main content
SHARE
Publication

Driving Next-Generation Workflows from the Data Plane

by Frederic Suter, Rafael Ferreira Da Silva, Ana Gainaru, Scott A Klasky
Publication Type
Conference Paper
Book Title
2023 IEEE 19th International Conference on e-Science (e-Science)
Publication Date
Page Numbers
1 to 10
Publisher Location
New Jersey, United States of America
Conference Name
WORKS 2023 workshop at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) 2023
Conference Location
Denver, Colorado, United States of America
Conference Sponsor
ACM, IEEE
Conference Date
-

We observe the emergence of a new generation of scientific workflows that process data produced at a sustained rate by scientific instruments and large scale numerical simulations. This data is consumed by multiple analysis, visualization, or Machine Learning components not only to enable inference and justify the scientific program, but also to monitor and steer the evolution of these experiments. In such workflows, moving intermediate data efficiently is key to performance, more than efficiently scheduling computational tasks. However, most traditional workflow management systems focus on optimizing task scheduling and then deal with data management, assuming a “move little, compute for long” model, which makes them unfit to the efficient management of this new generation of workflows. Therefore, we advocate for a new way to manage scientific workflows. We propose to consider an efficiently and independently managed data plane that can store and stream data. Workflows compute components, in the application plane can then interact with the data plane, abstracted from complexities of data management. Then, the role of a workflow management system would become that of a control plane that allows users to connect services together to execute the workflow and manages connections between the application and data planes. In this position paper, we characterize several next-generation workflow motifs and describe how their interaction with the data plane is a challenge to traditional workflow management systems. Then, we express a set of requirements that a workflow management system should meet to efficiently manage next-generation workflows at different scales. Based on these requirements, we expose our vision of driving next-generation workflows from the data plane and list remaining open challenges.