Skip to main content
SHARE
Publication

Navigating Exascale Operational Data Analytics: From Inundation to Insight

Publication Type
Conference Paper
Book Title
SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publication Date
Page Numbers
1795 to 1804
Publisher Location
New Jersey, United States of America
Conference Name
Sustainable Supercomputing (SusSup24) SC24 Workshops - The International Conference for High Performance Computing, Networking, Storage, and Analysis
Conference Location
Atlanta, Georgia, United States of America
Conference Sponsor
IEEE Computer Society, TCHPC, ACM, SIGHPC
Conference Date
-

In this paper, we address the challenges in achieving sustainable data-driven efficiency by providing a detailed exploration of the end-to-end operational data analytics (ODA) framework that evolved through two generations of supercomputer systems at the Oak Ridge Leadership Computing Facility (OLCF). This framework addresses large data streams ingested from heavily instrumented HPC environment that accumulates multi-terabytes per day. We outline the multifaceted data life cycle across HPC procurement, operations, and research & development, identifying key obstacles and design decisions that shape effective strategies in building and supporting data pipelines end-to-end. By sharing key insights and lessons learned from our experience, we offer recommendations for the HPC community on enabling sustainable operational data analytics and beyond. Our contributions aim to bridge the gap between potential and real benefits of operational data, guiding future efforts towards integrated and sustainable operational intelligence in high-performance computing environments.