Abstract
Diskless high-performance computing (HPC) systems utilizing networked storage have become popular in the last several years. Removing disk drives significantly increases compute node reliability as they are known to be a major source of failures. Furthermore, networked storage solutions utilizing parallel I/O and replication are able to provide increased scalability and availability. Reducing a compute node to processor(s), memory and network interface(s) greatly reduces its physical size, which in turn allows for large-scale dense HPC solutions. However, one major obstacle is the requirement by certain operating systems (OSs), such as Linux, for a root file system. While one solution is to remove this requirement from the OS, another is to share the root file system over the networked storage. This paper evaluates three networked file system solutions, NFSv4, Lustre and PVFS2, with respect to their performance, scalability, and availability features for servicing a common root file system in a diskless HPC configuration. Our findings indicate that Lustre is a viable solution as it meets both, scaling and performance requirements. However, certain availability issues regarding single points of failure and control need to be considered.