Abstract
The rapid growth in scientific data generation is outpacing advancements in computing systems necessary for efficient storage, transfer, and analysis, particularly in the context of exascale computing. With the deployment of first-generation exascale computing systems and next-generation experimental facilities, this gap is widening and necessitates effective data reduction techniques to manage enormous data volumes. Over the past decade, various data reduction methods, including lossless compression, error-controlled lossy compression, and data refactoring, have been developed to accelerate I/O in scientific workflows. Despite significant reductions in data volume, these methods introduce considerable computational overhead, which can become the new bottleneck in data processing. To mitigate this, GPU-accelerated data reduction algorithms have been introduced. However, challenges remain in their integration into exascale workflows, including limited portability across different GPU architectures, substantial memory transfer overhead, and reduced scalability on dense multi-GPU systems. To address these challenges, we propose HPDR, a high-performance and portable data reduction framework. HPDR is designed to enable the execution of state-of-the-art reduction algorithms across diverse processor architectures while reducing memory transfer overhead to 2.3 % of the original, resulting in up to 3.5× faster throughput compared to existing solutions. It also achieves up to 96% of the theoretical speedup in multi-GPU settings. In addition, evaluations on accelerating I/O operations at scale up to 1,024 nodes of the Frontier supercomputer demonstrate that HPDR can achieve up to 103 TB/s reduction throughput, providing up to 4× acceleration in parallel I/O performance compared to existing data reduction routines. This work highlights the potential of HPDR to significantly enhance data reduction efficiency in exascale computing environments.