Research Highlight

New look at supercomputing workflows supports more effective collaboration

Scientists at the Department of Energy’s Oak Ridge National Laboratory are examining the diverse supercomputing workflow management systems in use in the United States and around the world to help supercomputers work together more effectively and efficiently.

Because supercomputers have largely developed in isolation from each other, existing modeling and simulation, grid/data analysis, and optimization workflows meet highly specific needs and therefore cannot easily be transferred from one computing environment to another.

Divergent workflow management systems can make it difficult for research scientists at national laboratories to collaborate with partners at universities and international supercomputing centers to create innovative workflow-based solutions that are the strength and promise of supercomputing.

Led by Jay Jay Billings, team lead for the Scientific Software Development group in ORNL’s Computing and Computational Sciences Directorate, the scientists have proposed a “building blocks” approach in which individual components from multiple workflow management systems are combined in specialized workflows.

Billings worked with Shantenu Jha of the Computational Science Initiative at Brookhaven National Laboratory and Rutgers University, and Jha presented their research at the 2017 Workshop on Open Source Supercomputing in Denver in November 2017. Their article appears in the workshop’s proceedings.

Improved workflow technologies could assist researchers using supercomputers, such as the laboratory’s Cray XK7 known as Titan, in the future. (hi-res image)
The researchers began by analyzing how existing workflow management systems work—the tasks and data they process, the order of execution, and the components involved. Factors that can be used to define workflow management systems include whether a workflow is long or short running, runs internal cycles or in linear fashion with an endpoint, and requires humans to complete. Long used to understand business processes, the workflow concept was introduced in scientific contexts where automation was useful for research tasks such as setting up and running problems on supercomputers and then analyzing the resulting data.

Viewed through the prism of today’s complex research endeavors, supercomputers’ workflows clearly have disconnects that can hamper scientific advancement. For example, Billings pointed out that a project might draw on multiple facilities’ work while acquiring data from experimental equipment, performing modeling and simulation on supercomputers, and conducting data analysis using grid computers or supercomputers. Workflow management systems with few common building blocks would require installation of one or more additional workflow management systems—a burdensome level of effort that also causes work to slow down.

“Poor or nonexistent interoperability is almost certainly a consequence of the ‘Wild West’ state of the field,” Billings said. “And lack of interoperability limits reusability, so it may be difficult to replicate data analysis to verify research results or adapt the workflow for new problems.”

The open building blocks workflows concept being advanced by ORNL’s Scientific Software Development group will enable supercomputers around the world to work together to address larger scientific problems that require workflows to run on multiple systems for complete execution.

Future work includes testing the hypothesis that the group’s approach is more scalable and sustainable and a better practice.

This research is supported by DOE and ORNL’s Laboratory Directed Research and Development program.

ORNL is managed by UT–Battelle for DOE’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit