Skip to main content
SHARE
Publication

A Vision for Coupling Operation of US Fusion Facilities with HPC Systems and the Implications for Workflows and Data Manageme...

Publication Type
Conference Paper
Book Title
Accelerating Science and Engineering Discoveries Through Integrated Research Infrastructure for Experiment, Big Data, Modeling and Simulation
Publication Date
Page Numbers
87 to 100
Volume
1690
Publisher Location
Cham, Switzerland
Conference Name
22nd Smoky Mountains Computational Sciences and Engineering Conference (SMC)
Conference Location
Oak Ridge, Tennessee, United States of America
Conference Sponsor
Oak Ridge National Laboratory
Conference Date
-

The operation of large US Department of Energy (DOE) research facilities, like the DIII-D National Fusion Facility, results in the collection of complex multi-dimensional scientific datasets, both experimental and model-generated. In the future, it is envisioned that integrated data analysis coupled with large-scale high performance computing (HPC) simulations will be used to improve experimental planning and operation. Practically, massive data sets from these simulations provide the physics basis for generation of both reduced semi-analytic and machine-learning-based models. Storage of both HPC simulation datasets (generated from US DOE leadership computing facilities) and experimental datasets presents significant challenges. In this paper, we present a vision for a DOE-wide data management workflow that integrates US DOE fusion facilities with leadership computing facilities. Data persistence and long-term availability beyond the length of allocated projects is essential, particularly for verification and recalibration of artificial intelligence and machine learning (AI/ML) models. Because these data sets are often generated and shared among hundreds of users across multiple leadership computing facility centers, they would benefit from cross-platform accessibility, persistent identifiers (e.g. DOI, or digital object identifier), and provenance tracking. The ability to handle different data access patterns suggests that a combination of low cost, high latency (e.g. for storing ML training sets) and high cost, low latency systems (e.g. for real-time, integrated machine control feedback) may be needed.