Dragon: Breaking Gpu Memory Capacity Limits with Direct Nvm Access

Show authors

Publication Type

Conference Paper

Journal Name

Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis

Publication Date

November, 2018

Page Number

Volume

2018

Issue

Conference Name

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Conference Location

Dallas, Texas, United States of America

Conference Sponsor

IEEE

Abstract

Heterogeneous computing with accelerators is growing in importance in high performance computing (HPC), deep learning (DL), and other areas. Recently, application datasets have expanded beyond the memory capacity of these accelerators, and often beyond the capacity of their hosts. Meanwhile, non-volatile memory (NVM) storage has emerged as a pervasive component on nearly all computing systems including HPC systems because NVM provides massive amounts of memory capacity at affordable cost and power. Currently, for accelerator applications to use NVM, they must manually orchestrate data movement across multiple memories. This effort typically requires careful restructuring of the application, and it only performs well for applications with simple data access behaviors. To address this issue, we have developed DRAGON, a solution that enables all classes of GP-GPU applications to transparently compute on terabyte datasets residing in NVM, while ensuring the integrity of data buffers as necessary for NVM. DRAGON leverages the page-faulting mechanism on the recent NVIDIA GPUs by extending capabilities of CUDA Unified Memory (UM). Further, DRAGON improves overall performance by dynamically optimizing accesses to NVM. We empirically evaluate DRAGON on NVIDIA P100 GPU and a 2.4 TB Micron 9100 NVMe card using traditional HPC kernels and popular DL workloads; our experimental results show that DRAGON transparently expands memory capacity, and utilizes Linux’s page-cache mechanism to obtain additional speedups up to 2.3x against CUDA-UM via automated I/O, data transfer, and computation overlapping.

Dragon: Breaking Gpu Memory Capacity Limits with Direct Nvm Access

Abstract

Researchers

Organizations