Abstract
In keeping with the trend of heterogeneity in high-performance computing, hardware manufacturers and vendors are developing new architectures and associated software stacks (e.g., libraries) to harness the best possible performance from commonly used kernels (e.g., linear algebra kernels). However, kernels tuned for one architecture are not portable to others. Moreover, the coexistence of different architectures in a single node makes orchestration difficult. To address these challenges, we introduce LaRIS, a portable framework for LAPACK functionalities. LaRIS ensures a separation between linear algebra algorithms and vendor-library kernels by using the IRIS run time and IRIS-BLAS library. Such abstraction at the algorithm level makes the implementation completely agnostic to the vendor library and architecture. LaRIS uses the IRIS run time to dynamically select the vendor-library kernel and suitable processor architecture at run time. Through LU factorization, we demonstrate that LaRIS can fully utilize different heterogeneous systems by launching and orchestrating different vendor-library kernels without any change in the source code.