Skip to main content
SHARE
Publication

What is the right balance for performance and isolation with virtualization in HPC?...

Publication Type
Conference Paper
Book Title
Proceedings of Euro-Par 2014 - Parallel Processing Workshops Lecture Notes in Computer Science: 7th Workshop on Resiliency in High Performance Computing (Resilience) in Clusters, Clouds, and Grids
Publication Date
Page Numbers
570 to 581
Volume
8805
Conference Name
uro-Par 2014: International Conference on Parallel Processing
Conference Location
Porto, Portugal
Conference Date
-

The use of virtualization in high-performance computing (HPC) has been suggested as a means to provide tailored services and added functionality that many users expect from full-featured Linux cluster environments. While the use of virtual machines in HPC can offer several benefits, maintaining performance is a crucial factor. In some instances performance criteria are placed above isolation properties and selective relaxation of isolation for performance is an important characteristic when considering resilience for HPC environments employing virtualization.

In this paper we consider some of the factors associated with balancing performance and isolation in configurations that employ virtual machines. In this context, we propose a classification of errors based on the concept of "error zones", as well as a detailed analysis of the trade-offs between resilience and performance based on the level of isolation provided by virtualization solutions. Finally, the results from a set of experiments are presented, that use different virtualization solutions, and in doing so allow further elucidation of the topic.