Skip to main content
Publication

Best Practices and Lessons Learned from Deploying and Operating Large-scale Data-centric Parallel File Systems

Publication Type
Conference Paper
Publication Date
Conference Name
SuperComputing 2014 (SC'14)
Conference Location
New Orleans, Louisiana, United States of America
Conference Date
-

Oak Ridge Leadership Computing Facility (OLCF) has deployed multiple world-class parallel file systems to support its compute, data analysis, and visualization platforms. After deployment, OLCF has continued to hone operating strategies for these systems in response to rapidly changing technologies and user demands. OLCF's file systems have also served as test platforms for new file system features; as technology evaluation platforms for I/O strategies and benchmarks; and as hubs for data storage and transfer for OLCF users. During this process, OLCF has acquired significant expertise in the areas of data storage systems, file system software, technology evaluation, benchmarking, and procurement practices. This paper provides an account of our experience and lessons learned in acquiring, deploying, and operating large, parallel file systems. We believe that these lessons will be useful to the wider HPC community involved in such activities.