Abstract
Tiling matrix operations can improve the load balancing and performance of applications on heterogeneous computing resources. Writing a tile-based algorithm for each operation with a traditional, hand-tuned tiling approach that uses for loops in C/C++ is cumbersome and error prone. Moreover, it must enable and support the heterogeneous memory management of data objects and also explore architecture-supported, native, tiled-data transfer APIs instead of copying the tiled data to continuous memory before the data transfer. The tiling framework provides a tiled data structure for heterogeneous memory mapping and parameterization to a heterogeneous task specification API. We have integrated our tiled framework into MatRIS (Math kernels library using IRIS). IRIS is a heterogeneous run-time framework with a heterogeneous programming model, memory model, and task execution model. Experiments reveal that the tiled framework for BLAS operations has improved the programmability of tiled BLAS and improved performance by ~20% when compared against the traditional method that copies the data to continuous memory locations for heterogeneous computing.