Abstract
This paper proposes an efficient data memory management approach for the Intelligent RuntIme System (IRIS) heterogeneous computing framework along with new data transfer policies. IRIS provides a task-based programming model for extreme heterogeneous computing (e.g., CPU, GPU, DSP, FPGA) with support for today's most important programming languages (e.g., OpenMP, OpenCL, CUDA, HIP, OpenACC). However, the IRIS framework either forces the programmer to introduce data transfer commands for each task or relies on suboptimal memory management for automatic and transparent data transfers. The work described here extends IRIS with novel heterogeneous memory handling and introduces novel data transfer policies by employing the Distributed data MEMory handler (DMEM) for efficient and optimal movement of data among the various computing resources. The proposed approach achieves performance gains of up to 7× for tiled LU factorization and tiled DGEMM (i.e., matrix multiplication) benchmarks. Moreover, this approach also reduces data transfers by up to 71% when compared to previous IRIS heterogeneous memory management handlers. This work compares the performance results of the IRIS framework's novel DMEM with the StarPU runtime and MAGMA math library for GPUs. Experiments show a performance gain of up to 1.95× over StarPU and 2.1× over MAGMA.