Skip to main content
Publication

Revealing power, energy and thermal dynamics of a 200PF pre-exascale supercomputer...

by Woong Shin, Vladyslav Oles, Ahmad Maroof Karimi Nln, John A Ellis, Feiyi Wang
Publication Type
Conference Paper
Book Title
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publication Date
Page Numbers
1 to 14
Publisher Location
United States of America
Conference Name
The International Conference for High Performance Computing, Networking, Storage and Analysis (SC)
Conference Location
St. Louis, Missouri, United States of America
Conference Sponsor
SIGHPC, IEEE Computer Society, TCHPC
Conference Date
-

As we approach the exascale computing era, the focused understanding of power consumption and its overall constraint on HPC architectures and applications are becoming increasingly paramount. Summit, located at the Oak Ridge Leadership Computing Facility (OLCF), is one of the fastest and largest pre-exascale platforms in operation today. This paper provides a first-order examination and analysis of power consumption at the component-level, node-level, and system-level, from all 4,626 Summit compute nodes, each with over 100 metrics at 1Hz frequency over the entire year of 2020. We also investigate the power characteristics and energy efficiency of over 840k Summit jobs and 250k GPU failure logs for further operational insights. To the best of our knowledge, this is the first systematic analysis of power data of HPC system at this scale.