Validation and Uncertainty Assessment of Extreme-Scale HPC Simulation through Bayesian Inference


Simulation of high-performance computing (HPC) systems plays a critical role in their development - especially as HPC moves toward the co-design model used for embedded systems, tying hardware and software into a unified design cycle. Exploring system-wide tradeoffs in hardware, middleware and applications using high-fidelity cycle-accurate simulation, however, is far too costly. Coarse-grained methods can provide efficient, accurate simulation but require rigorous uncertainty quantification (UQ) before using results to support design decisions. We present here SST/macro, a coarse-grained structural simulator providing flexible congestion models for low-cost simulation. We explore the accuracy limits of coarse-grained simulation by deriving error distributions of model parameters using Bayesian inference. Propagating these uncertainties through the model, we demonstrate SST/macro’s utility in making conclusions about performance tradeoffs for a series of MPI collectives. Low-cost and high-accuracy simulations coupled with UQ methodology make SST/macro a powerful tool for rapidly prototyping systems to aid extreme-scale HPC co-design.

European Conference on Parallel Processing