Abstract
This work focuses on tools for investigating algorithm performance at extreme scale with millions of concurrent threads and for evaluating the impact of future architecture choices to facilitate the co-design of high-performance computing (HPC) architectures and applications. The approach focuses on lightweight simulation of extreme-scale HPC systems with the needed amount of accuracy. The prototype presented in this paper is able to provide this capability using a parallel discrete event simulation (PDES), such that a Message Passing Interface (MPI) application can be executed at extreme scale, and its performance properties can be evaluated. The results of an initial prototype are encouraging as a simple 'hello world' MPI program could be scaled up to 1,048,576 virtual MPI processes on a four-node cluster, and the performance properties of two MPI programs could be evaluated at up to 16,384 virtual MPI processes on the same system.