Skip to main content
SHARE
Publication

Durango: Scalable Synthetic Workload Generation for Extreme-Scale Application Performance Modeling and Simulation...

Publication Type
Conference Paper
Book Title
SIGSIM-PADS '17: Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation
Publication Date
Page Numbers
97 to 108
Conference Name
SIGSIM-PADS '17
Conference Location
Singapore, Singapore
Conference Sponsor
ACM
Conference Date
-

Performance modeling of extreme-scale applications on accurate representations of potential architectures is critical for designing next generation supercomputing systems because it is impractical to construct prototype systems at scale with new network hardware in order to explore designs and policies. However, these simulations often rely on static application traces that can be difficult to work with because of their size and lack of flexibility to extend or scale up without rerunning the original application. To address this problem, we have created a new technique for generating scalable, flexible workloads from real applications, we have implemented a prototype, called Durango, that combines a proven analytical performance modeling language, Aspen, with the massively parallel HPC network modeling capabilities of the CODES framework.

Our models are compact, parameterized and representative of real applications with computation events. They are not resource intensive to create and are portable across simulator environments. We demonstrate the utility of Durango by simulating the LULESH application in the CODES simulation environment on several topologies and show that Durango is practical to use for simulation without loss of fidelity, as quantified by simulation metrics. During our validation of Durango's generated communication model of LULESH, we found that the original LULESH miniapp code had a latent bug where the MPI_Waitall operation was used incorrectly. This finding underscores the potential need for a tool such as Durango, beyond its benefits for flexible workload generation and modeling.

Additionally, we demonstrate the efficacy of Durango's direct integration approach, which links Aspen into CODES as part of the running network simulation model. Here, Aspen generates the application-level computation timing events, which in turn drive the start of a network communication phase. Results show that Durango's performance scales well when executing both torus and dragonfly network models on up to 4K Blue Gene/Q nodes using 32K MPI ranks, Durango also avoids the overheads and complexities associated with extreme-scale trace files.