Achievement
High-performance computing systems consume vast amounts of energy, particularly when moving data between different parts of the machine. To address this challenge, a research team investigated a novel strategy for optimizing data transfers. They compared a standard method of sending many small, individual data packets with a newer approach that bundles, or "aggregates," these small packets into larger, more efficient shipments. The experiments, conducted on the Frontier supercomputer, revealed that the aggregation strategy was not only significantly faster but also used substantially less total energy. This work demonstrates a practical and effective way to reduce the energy footprint of supercomputers by changing how they communicate data.
Significance and Impact
The discovery of such a dramatic energy-saving technique has several important implications for the future of high-performance computing:
- It provides a concrete software strategy to combat the growing energy consumption of exascale supercomputers.
- It directly tackles the data movement bottleneck, where bursts of small data requests can cause congestion and waste energy as processors wait.
- It reveals a powerful, counter-intuitive principle: using more power momentarily to finish tasks much faster leads to massive overall energy savings.
- It offers clear design insights for building the next generation of energy-efficient parallel programming models, essential for sustainable supercomputing.
Research Details
To validate this approach, the team designed a rigorous set of experiments:
- Experiments were performed on Frontier, a world-class supercomputer located at Oak Ridge National Laboratory.
- The team ran a series of standard computing tasks designed to test performance on complex data communication patterns.
- They directly compared the performance and energy consumption of the traditional, one-by-one data transfer method against the new message aggregation strategy.
- The system's built-in hardware counters were used to precisely measure execution time, moment-to-moment power draw, and the total energy consumed by the processor and memory.
Facility
This research was conducted using resources of the Oak Ridge Leadership Computing Facility, a U.S. Department of Energy Office of Science User Facility.
Sponsor/Funding
This work was funded through Strategic Partnership Projects Funding Office via Los Alamos National Laboratory with IAN 619215901 on the project “OpenSHMEM - Standardized API for parallel programming in the Partitioned Global Address Space”.
Principal Investigator and Team
PI: Oscar Hernandez, Oak Ridge National Laboratory. Team: Aaron Welch, Oak Ridge National Laboratory and Wendy Poole and Stephen Poole, Los Alamos National Laboratory.
Citation and DOI
Hernandez, O., Welch, A., Poole, W., Poole, S. (2025). Preliminary Study on Message Aggregation Optimizations for Energy Savings in PGAS Models. In: Barik, R., Gupta, R., Palsberg, J. (eds) Principles and Practices of Building Parallel Software. Lecture Notes in Computer Science, vol 14564. Springer, Cham. https://doi.org/10.1007/978-3-031-97492-2_12
Summary
This study concludes that message aggregation is a powerful technique for reducing energy consumption in high-performance computing. The research revealed a crucial trade-off: while the aggregation method draws more power moment-to-moment, it completes tasks so much more quickly that it results in dramatic overall energy savings—in some cases reducing energy use by a factor of 41. This finding is vital for the future of supercomputing, as it provides a clear path toward building more sustainable and efficient systems. By optimizing how data is moved, researchers can unlock new capabilities for scientific discovery while simultaneously managing the immense power challenges of next-generation machines.