Monitoring Cray Cooling Systems...

by Don E Maxwell, Matthew A Ezell, Jeff Becklehimer, Matthew J Donovan, Christopher C Layton
Publication Type
Conference Paper
Publication Date
Conference Name
Cray User Group
Conference Location
Lugano, Switzerland
Conference Date

While sites generally have systems in place to monitor the health of Cray computers themselves, often the cooling systems are ignored until a computer failure requires investigation into the source of the failure. The Liebert XDP units used to cool the Cray XE/XK models as well as the Cray proprietary cooling system used for the Cray XC30 models provide data useful for health monitoring. Unfortunately, this valuable information is often available only to custom solutions not accessible by a center-wide monitoring system or is simply ignored entirely. In this paper, methods and tools used to harvest the monitoring data available are discussed, and the implementation needed to integrate the data into a center-wide monitoring system at the Oak Ridge National Laboratory is provided.