Skip to main content
SHARE
Publication

Preliminary Study on Fine-Grained Power and Energy Measurements on Grace Hopper GH200 with Open-Source Performance Tools

Publication Type
Conference Paper
Book Title
HPC Asia '25 Workshops: Proceedings of the 2025 International Conference on High Performance Computing in Asia-Pacific Region Workshops
Publication Date
Page Numbers
11 to 22
Publisher Location
New York, New York, United States of America
Conference Name
International Workshop on Arm-based HPC: Practice and Experience
Conference Location
Hsinchu, Taiwan
Conference Sponsor
National Center for High-performance Computing (NCHC)
Conference Date
-

The increasing adoption of tightly integrated, heterogeneous architectures, combined with the slowdown of Moore’s law, has made application power and energy-driven optimizations critical to efficiently use high-performance computing systems. This paper introduces a newly developed open-source toolkit that seamlessly integrates the Linux real-time hardware monitoring program hwmon with the Performance Application Programming Interface and the Score-P performance measurement system, thereby enabling fine-grained power and energy measurements for high-performance computing applications. Our primary target platform is the Wombat test bed, which is a system based on the NVIDIA GH200 superchip. The toolkit can capture transient power peaks with high temporal resolution (50 ms) and, thanks to Score-P integration, can map power metrics to specific code regions, thereby providing actionable information on power-intensive operations and inefficiencies. The toolkit also provides a holistic view of both the power and the energy consumption of the entire GH200 superchip by covering all major components: the Grace CPU, the Hopper GPU, and the I/O subsystem. Experiments that use Locally Self-consistent Multiple Scattering, which is an application for first-principles calculations of materials developed at Oak Ridge National Laboratory, have demonstrated the tool’s ability to identify transient power spikes and uncover opportunities for energy-aware optimizations. Additionally, we introduce a Python-based utility for converting Open Trace Format 2 traces to Parquet format, thus enabling advanced data analysis for numerical integration methods applied to power data for accurate energy profiling.