Can I/O Variability Be Reduced on QoS-Less HPC Storage Systems?

Show authors

Publication Type

Journal

Journal Name

IEEE Transactions on Computers

Publication Date

November, 2018

Page Numbers

631 to 645

Volume

Issue

View DOI Listing

Abstract

For a production high-performance computing (HPC) system, where storage devices are shared between multiple applications and managed in a best effort manner, I/O contention is often a major problem. In this paper, we propose a balanced messaging-based re-routing in conjunction with throttling at the middleware level. This work tackles two key challenges that have not been fully resolved in the past: whether I/O variability can be reduced on a QoS-less HPC storage system, and how to design a runtime scheduling system that can scale up to a large amount of cores. The proposed scheme uses a two-level messaging system to re-route I/O requests to a less congested storage location so that write performance is improved, while limiting the impact on read by throttling re-routing. An analytical model is derived to guide the setup of optimal throttling factor. We thoroughly analyze the virtual messaging layer overhead and explore whether the in-transit buffering is effective in managing I/O variability. Contrary to the intuition, in-transit buffer cannot completely solve the problem. It can reduce the absolute variability but not the relative variability. The proposed scheme is verified against a synthetic benchmark as well as being used by production applications.

Can I/O Variability Be Reduced on QoS-Less HPC Storage Systems?

Abstract

Researchers

Organizations