Skip to main content

Workflow Systems Group

Optimizing the end-to-end management of scientific data to enable knowledge discovery.

The mission of the Workflow Systems group is to advance scientific discovery through innovative computational methods that make complex data more accessible, efficient, and meaningful. We develop cutting-edge solutions for managing and analyzing massive scientific datasets, with a particular focus on artificial intelligence, high-performance computing, and data reduction techniques that reduce energy consumption while maintaining scientific accuracy. Through this work, we aim to accelerate breakthroughs across multiple scientific domains while promoting sustainable computing practices that will shape the future of scientific research. This work includes:

  • Work on AI/ML and high-performance computing
  • Focus on managing and analyzing large scientific datasets
  • Emphasis on energy efficiency and sustainability
  • Development of data compression techniques
  • Cross-cutting impact across multiple scientific domains
  • Balance between computational efficiency and scientific accuracy

The Workflow Systems group is advancing the frontiers of scientific computing through innovative approaches to coupled workflow automation, scientific data management, and data reduction. HPC and AI permeate into all areas of our research allowing us to work with hundreds of mission critical applications including areas of fusion energy, accelerator science, aerospace, and radio astronomy. Our group has created software that includes:

  • ADIOS (Adaptable Input/Output System) is a high-performance I/O middleware developed to efficiently manage the storage, retrieval, and movement of large-scale scientific data in high-performance computing (HPC) environments. ADIOS provides a flexible, scalable, and portable framework for reading and writing data from simulations, experiments, or instruments. It supports a variety of data transport methods and file formats, allowing users to optimize I/O performance based on their application's needs. With its support for in situ data processing and tight integration with analysis and visualization tools, ADIOS plays a critical role in reducing I/O bottlenecks and enabling real-time data insights in exascale and data-intensive computing workflows.
  • MGARD (Multigrid Adaptive Reduction of Data) is a high-performance, error-bounded compression and refactoring framework designed for scientific data. Built on multigrid numerical methods, MGARD enables efficient storage and transmission of large, multidimensional floating-point datasets by decomposing them hierarchically and preserving user-specified accuracy. It supports in situ workflows, progressive data access, and AI-readiness by maintaining critical data features while significantly reducing size. MGARD is widely used in HPC applications to overcome I/O and storage limitations without sacrificing scientific fidelity.
  • SimGrid is an open-source simulation framework for modeling and evaluating the performance of large-scale distributed systems such as HPC platforms, cloud infrastructures, and peer-to-peer networks. It enables users to simulate system behavior with high scalability and accuracy, using detailed models of computation and communication. With support for MPI application simulation and customizable scheduling policies, SimGrid is widely used in research to study resource allocation, fault tolerance, and energy efficiency—allowing reproducible experiments without the need for physical deployments.
  • EFFIS (Experimental Framework for Integrated Scientific Simulations) is a DOE-developed software ecosystem designed to support large-scale scientific workflows by enabling the integration of simulation, data management, and in situ analysis. Built to run on high-performance computing systems, EFFIS provides tools for orchestrating complex multi-physics simulations, managing data flow between components, and performing real-time monitoring and visualization. By streamlining these tasks, EFFIS enhances productivity and scientific insight, especially in exascale environments where efficient coordination of computation and data is critical.
  • Campaign Management of Scientific Data (CMSD) Campaign management of scientific data refers to the coordinated planning, collection, processing, storage, and analysis of data generated during large-scale scientific experiments or simulations. It involves organizing data across distributed teams and facilities, ensuring consistency, traceability, and compliance with data standards. Effective campaign management enables seamless integration of simulation and experimental results, supports real-time decision-making, and ensures data is accessible, well-documented, and AI-ready for future analysis. It is critical for maximizing scientific return, particularly in complex, multi-institutional research efforts such as fusion energy, climate modeling, or high-energy physics.

What makes our group work particularly impactful is its practical applications across multiple scientific domains. Our tools and methods are being used in fusion energy research, materials science, and cosmology, helping to address fundamental questions about our universe while advancing sustainable computing practices. Our innovations in energy-efficient computing and data management are particularly relevant as scientific computing faces increasing sustainability challenges.