Programmer productivity and performance portability are two of the most important challenges facing users of exascale architectures that include heterogeneous compute nodes, deep memory hierarchies, and persistent memory. Library and application developers targeting these architectures will find it increasingly difficult to meet these two challenges without integrated capabilities that allow for flexibility, composability, and interoperability across a mixture of programming, runtime, and architectural components. The PROTEAS-TUNE project is developing a set of programming technologies that will provide developers with portable programming solutions for exascale architectures.
The PROTEAS-TUNE project focuses on performance portability and productivity across increasingly diverse and complex architectures. Key capabilities include support for heterogeneous computing; performance analysis; autotuning; programming nonvolatile memory; code transformations; and just-in-time compilation. In particular, the PROTEAS-TUNE team is developing and contributing several critical pieces of infrastructure and optimizations to enable application portability and high performance on exascale architectures to the community LLVM compiler project.
The PROTEAS-TUNE team is (1) improving the core-LLVM compiler ecosystem; (2) designing and implementing the OpenACC heterogeneous programming model for LLVM; (3) using performance modeling and optimization to enable code transformation and performance portability; (4) refining autotuning for OpenMP and OpenACC programming models in order to directly target challenges with heterogeneous architectures; (5) improving performance measurement and analysis tools for exascale architectures and applying them to improve application performance; (6) developing and implementing portable software abstractions for managing persistent memory; and (7) aggressively engaging library and application developers to use their technologies.
Progress to date
- The PROTEAS-TUNE team used modeling and performance optimization on proxy apps to evaluate application kernel speedup on pre-exascale architectures, which has driven improvements in threading, data layout, communication, and specialized hardware capabilities.
- The team built and demonstrated an initial implementation of the OpenACC heterogeneous programming model for LLVM.
- The team added multiple new capabilities to their performance measurement and analysis tools in order to support accelerator profiling, new programming models and languages, and workflows.
- The team developed and implemented a portability abstraction for using nonvolatile memory on pre-exascale architectures.