Evaluating the Shared Root File System Approach for Diskless High-Performance Computing Systems...

by Christian Engelmann, Hong H Ong, Stephen L Scott

Publication Type

Conference Paper

Publication Date

March, 2009

Conference Name

10th LCI International Conference on High-Performance Clustered Computing (LCI) 2009

Conference Location

Boulder, Colorado, United States of America

Conference Date

Mar 9, 2009 - Mar 12, 2009

Abstract

Diskless high-performance computing (HPC) systems utilizing networked storage have become popular in the last several years. Removing disk drives significantly increases compute node reliability as they are known to be a major source of failures. Furthermore, networked storage solutions utilizing parallel I/O and replication are able to provide increased scalability and availability. Reducing a compute node to processor(s), memory and network interface(s) greatly reduces its physical size, which in turn allows for large-scale dense HPC solutions. However, one major obstacle is the requirement by certain operating systems (OSs), such as Linux, for a root file system. While one solution is to remove this requirement from the OS, another is to share the root file system over the networked storage. This paper evaluates three networked file system solutions, NFSv4, Lustre and PVFS2, with respect to their performance, scalability, and availability features for servicing a common root file system in a diskless HPC configuration. Our findings indicate that Lustre is a viable solution as it meets both, scaling and performance requirements. However, certain availability issues regarding single points of failure and control need to be considered.

Evaluating the Shared Root File System Approach for Diskless High-Performance Computing Systems...

Abstract

Researchers

Organizations