Skip to main content
SHARE
Publication

Analysis and Modeling of the End-to-End I/O Performance on OLCF's Titan Supercomputer...

Publication Type
Conference Paper
Book Title
2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)
Publication Date
Page Numbers
1 to 9
Conference Name
IEEE International Conference on High Performance Computing and Communications; IEEE International Conference on Smart City; IEEE International Conference on Data Science and Systems (HPCC/SmartCity/DSS)
Conference Location
Bangkok, Thailand
Conference Sponsor
IEEE
Conference Date
-

With the increase of scale and complexity seen in a variety of leadership-class scientific computation and simulation applications, it has become more important to understand their I/O performance characteristics. The user-observed performance is a combination of properties of how the application is using the HPC facility, as well as how others' use of the facility causes variability in the static machine capabilities. Our work leverages statistical analysis of I/O performance data gathered with fine time resolution over a full week from Titan supercomputer. Based on observed properties of the distribution of I/O latencies, we build a three-state hidden Markov model (HMM) to characterize the end-to-end I/O performance on Titan. We parameterize our model using part of the field-gathered I/O performance data and validate it against the rest. The validation results demonstrate that our model can capture the dynamics of end-to-end I/O performance on Titan accurately.