Achievement: A team of researchers from Oak Ridge National Laboratory (ORNL), Intel Corporation and the University of Tennessee published an innovative tool-based solution to one of the most perplexing problems facing would-be users of today’s most powerful computers, namely an effective way to utilize complex heterogeneous architectures that include sophisticated memory systems. The authors present results for a range of applications that illustrate how this new technology impacts the High Performance Computing space and why its potential is exciting. The described toolset relies on new techniques for profiling memory usage during production execution. For the benchmarks in the study, the new approach is able to collect detailed data-tiering guidance with negligible execution time overhead in most cases and less than 10% overhead in the worst case. In addition, the paper provides a design and implements an online data tiering solution that leverages application feedback to steer data allocation and placement across a heterogeneous memory hierarchy. The new approach, inspired by solutions to the classical ski rental problem, only migrates data when the expected cost of doing so is outweighed by the cost of leaving it in place. Finally, the work demonstrates the effectiveness of this approach on a state-of-the-art heterogeneous memory system with both conventional DRAM and large-capacity NVRAM. The results show that it significantly outperforms unguided execution on average and achieves speedups ranging from 1.4x to more than 7x for selected HPC workloads. Additionally, it attains speedups similar to a comparable offline profiling-based approach after a short startup period.
Significance and Impact: As scaling of conventional memory devices has stalled, many high-end computing systems have begun to incorporate alternative memory technologies to meet performance goals. Since these technologies present distinct advantages and tradeoffs compared to conventional DDR* SDRAM, such as higher bandwidth with lower capacity or vice versa, they are typically packaged alongside conventional SDRAM in a heterogeneous memory architecture. To utilize the different types of memory efficiently, new data management strategies are needed to match application usage to the best available memory technology. However, current proposals for managing heterogeneous memories are limited, because they either (1) do not consider high-level application behavior when assigning data to different types of memory or (2) require separate program execution (with a representative input) to collect information about how the application uses memory resources.
This work presents a new data management toolset to address the limitations of existing approaches for managing complex memories. It extends the application runtime layer with automated monitoring and management routines that assign application data to the best tier of memory based on previous usage, without any need for source code modification or a separate profiling run. The paper evaluates this approach on a state-of-the-art server platform with both conventional DDR4 SDRAM and non-volatile Intel Optane DC memory, using both memory-intensive high-performance computing (HPC) applications as well as standard benchmarks. Overall, the results show that the new approach improves program performance significantly compared to a standard unguided approach across a variety of workloads and system configurations. The HPC applications exhibit the largest benefits, with speedups ranging from 1.4x to 7x in the best cases. Additionally, the work shows that this approach achieves similar performance as a comparable offline profiling-based approach after a short startup period, without requiring separate program execution or offline analysis steps.
Research Details: Despite their potential benefits, heterogeneous memory architectures present new challenges for data management. Computing systems have traditionally viewed memory as a single homogeneous address space, sometimes divided into different non-uniform memory access (NUMA) domains, but consisting entirely of the same storage medium (i.e., double data rate (DDR)* synchronous DRAM (SDRAM)). To utilize heterogeneous resources efficiently, alternative strategies are needed to match data to the appropriate technology in consideration of hardware capabilities, application usage, and in some cases, NUMA domain.
Spurred by this problem, the architecture and systems communities have proposed a range of hardware and software techniques to manage data efficiently on heterogeneous memory systems. The existing solutions exhibit various advantages, disadvantages, and tradeoffs, with most hardware-based techniques offering more ease of use and software transparency at the expense of flexibility and efficiency, while software-based solutions provide more fine-grained control of data placement (and, thus, better performance) in exchange for additional effort from developers and users. Unfortunately, there is currently no silver bullet, as the more flexible and more efficient software-based approaches still require significant efforts (and, in many cases, expert knowledge) to be effective.
To fill this gap, the team of ORNL, Intel and University of Tennessee researchers began developing a hybrid data management solution for complex memory systems based on automated application guidance. The hybrid approach combines source code analysis and offline architectural profiling with online components that are able to collect and apply application-level memory tiering guidance during production execution and without the need for a separate profile run. The work evaluates an earlier online approach using high-performance computing (HPC) as well as standard (SPEC CPU) computing benchmarks on an Intel Cascade Lake platform with two tiers of memory: conventional DDR4 SDRAM and non-volatile Optane DC. The experiments show that the new updated toolset can generate effective tiering guidance with very low overhead and typically achieves performance similar to previous offline profiling-based approach after a short initial startup period.
PI: Terry Jones (ORNL)
Sponsor/Funding: DOE/ECP SICM – program manager Doug Kothe.
Citation and DOI: M. Ben Olson, Brandon Kammerdiener, Michael R. Jantz, Kshitij A. Doshi and Terry Jones. “Online Application Guidance for Heterogeneous Memory Systems”. ACM Transactions on Architecture and Code Optimization (TACO), Volume 19, Issue 3. September 2022, Article No.: 45, pp 1–27. https://doi.org/10.1145/3533855