Abstract
Nuclear fusion holds the promise of an endless source of energy. Several research experiments across the world and joint modeling and simulation efforts between the nuclear physics and high performance computing communities are actively preparing the operation of the International Thermonuclear Experimental Reactor (ITER). Both experimental reactors and their simulated counterparts generate data that must be analyzed quickly and in a resilient way to support decision making for the configuration of subsequent runs or prevent a catastrophic failure. However, the cost if the traditional techniques used to improve the resilience of analysis workflows, i.e., replicating datasets and computational tasks, becomes prohibitive with explosion of the volume of data produced by modern instruments and simulations. Therefore, we advocate in this paper for an alternate approach based on data reduction and data streaming. The rationale is that by allowing for a reasonable, controlled, and guaranteed loss of accuracy it becomes possible to transfer smaller amounts of data, shorten the execution time of analysis workflows, and lower the cost of replication to increase resilience. We develop our research and development roadmap towards resilient near real-time analysis workflows in fusion energy science and present early results showing that data streaming and data reduction is a promising way to speed up the execution and improve the resilience of analysis workflows.