Skip to main content

Monitoring Extreme-scale Lustre Toolkit...

by Michael J Brim, Joshua K Lothian
Publication Type
Conference Paper
Publication Date
Conference Name
International Workshop on the Lustre Ecosystem: Challenges and Opportunities
Conference Location
Annapolis, Maryland, United States of America
Conference Sponsor
Oak Ridge National Laboratory, U.S. Department of Defense
Conference Date

We discuss the design and ongoing development of the Monitoring Extreme-scale Lustre Toolkit (MELT), a unified Lustre performance monitoring and analysis infrastructure that provides continuous, low-overhead summary information on the health and performance of Lustre, as well as on-demand, in-depth problem diagnosis and root-cause analysis. The MELT infrastructure leverages a distributed overlay network to enable monitoring of center-wide Lustre filesystems where clients are located across many network domains. We preview interactive command-line utilities that help administrators and users to observe Lustre performance at various levels of resolution, from individual servers or clients to whole filesystems, including job-level reporting. Finally, we discuss our future plans for automating the root-cause analysis of common Lustre performance problems.