Skip to main content
SHARE
Publication

Profiling the Usage of an Extreme-Scale Archival Storage System...

by Hyogi Sim, Sudharshan S Vazhkudai
Publication Type
Conference Paper
Book Title
2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)
Publication Date
Page Numbers
410 to 422
Conference Name
MASCOTS 2019: 27th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
Conference Location
Rennes, France
Conference Sponsor
IEEE
Conference Date
-

Profiling the archival storage system in scientific computing environments has received much less attention compared to the parallel file system, but is equally important since it stores the final data products safely, for a long duration. In this paper, we analyze eight years worth of data transfer logs for accessing the archival file system (HPSS) in the Oak Ridge Leadership Computing Facility (OLCF), which has been hosting the world's largest supercomputers and file systems. Our analysis encompasses about 135 million data transfer activities to the 80 PB High Performance Storage System (HPSS), between 2010 and 2017. We analyze the logs from several dimensions, including studying the workload characteristics (e.g., access patterns, frequency of accesses and temporal behavior), file system characteristics (e.g., directory depth, file system scaling trends, file types), and scientific user behavior (e.g., domain-specific usage and organization). Based on the analysis, we derive insights into the future evolution of the archive in terms of provisioning, desired features and functionality from the archive software, role and right sizing of the archive tiers, quota management, and the importance of smart and efficient metadata and storage management. We believe our study will prove useful for both operating current archival storage and better provisioning future systems.