Abstract
Memory subsystems contribute significantly to the performance and energy efficiency of high-performance computing (HPC) applications. Traditional memory technologies with conventional organization (e.g., DRAM) are struggling to keep up with the increasing memory requirements of modern applications. Techniques such as multilayer cache hierarchy and out-of-order execution are still falling short of mitigating the penalty incurred by memory accesses. Processing-in-memory (PIM), which involves moving memory-intensive kernels to memory for execution instead of bringing the data to the processing unit, is emerging as a promising technique. PIM has recently received traction among computer architecture researchers, and the increasing research activity surrounding this technique indicates its potential to alleviate main memory performance bottlenecks. In this paper, we characterize and identify memory-intensive HPC kernels, perform a first-order evaluation of the PIM technique for selected HPC kernels, quantify performance deviation, and analyze the key factors that affect PIM efficiency.