Experiences evaluating functionality and performance of IBM Power8+ systems Conference Paper June, 2017
Reliability Lessons Learned From GPU Experience With The Titan Supercomputer at Oak Ridge Leadership Computing Facility Conference Paper November, 2015
Understanding and Exploiting Spatial Properties of System Failures on Extreme-Scale HPC Systems... Conference Paper June, 2015
Experience with GPUs on the Titan Supercomputer from a Reliability, Performance and Power Perspective Conference Paper May, 2015
Analyzing the Interplay of Failures and Workload on a Leadership-Class Supercomputer Conference Paper April, 2015
Understanding GPU Errors on Large-scale HPC Systems and the Implications for System Design and Operation... Conference Paper February, 2015
Best Practices and Lessons Learned from Deploying and Operating Large-Scale Data-Centric Parallel File Systems... Conference Paper November, 2014
I/O Router Placement and Fine-Grained Routing on Titan to Support Spider II... Conference Paper May, 2014