Preliminary Study on Fine-Grained Power and Energy Measurements on Grace Hopper GH200 with Open-Source Performance Tools Conference Paper April, 2025
JACC.shared: Leveraging HPC Metaprogramming and Performance Portability for Computations That Use Shared Memory GPUs Conference Paper April, 2025
Software stewardship and advancement of a high-performance computing scientific application: QMCPACK Journal February, 2025
GPU-Centric Communication on NVIDIA GPU Clusters with InfiniBand: A Case Study with OpenSHMEM... Conference Paper December, 2017
Failures in Large Scale Systems: Long-term Measurement, Analysis, and Implications Conference Paper November, 2017
Characterizing Temperature, Power, and Soft-Error Behaviors in Data Center Systems: Insights, Challenges, and Opportunities Conference Paper November, 2017
An evaluation of the state of time synchronization on leadership class supercomputers Journal October, 2017
Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale Journal September, 2017
SharP Hash: A High-Performing Distributed Hash for Extreme-Scale Systems Conference Paper September, 2017
Big Data Meets HPC Log Analytics: Scalable Approach to Understanding Systems at Extreme Scale Conference Paper September, 2017