Skip to main content

The Challenge of Disproportionate Importance of Temporal Features in Predicting HPC Power Consumption...

by Chengcheng Li, Ahmad Maroof Karimi Nln, Woong Shin, Hairong Qi, Feiyi Wang
Publication Type
Conference Paper
Book Title
Proceedings of 2021 IEEE International Conference on Cluster Computing (CLUSTER)
Publication Date
Page Numbers
632 to 636
Publisher Location
United States of America
Conference Name
The IEEE Cluster Conference Workshops: Workshop on Monitoring and Analysis for HPC Systems Plus Applications (HPCMASPA 2021)
Conference Location
Portland, Oregon, United States of America
Conference Sponsor
EEE, IEEE Computer Society, SIGHPC
Conference Date

In this work, we demonstrate the challenges in predicting HPC cluster power consumption in the face of significant temporal skew in power consumption behavioral patterns. Predicting large power swings that extend several megawatts has significant operational value for HPC centers, however, prediction is challenging due to the relative rarity of such events and also due to the abrupt or disjoint deviation from the average power consumption levels. To study the impact of this challenge, we have trained a recurrent neural network (RNN) as a reasonably sophisticated model to predict power consumption of the one-year worth of node power consumption data from the Summit supercomputer located in the Oak Ridge Leadership Computing Facility. By studying the prediction results, we have found that although simple usage of RNN models can provide good results on average power consumption levels, it would fail at predicting the power swings that have more operational value. With such results, we discuss potential next steps in addressing such issues aiming towards a robust usage of power prediction techniques in HPC operations.