Effective utilization of novel hybrid architectures of pre-exascale and exascale machines requires transformations to global climate modeling systems that may not reproduce the original model solution bit-for-bit. Round-off level differences grow rapidly in these non-linear and chaotic systems. This makes it difficult to isolate bugs/errors from innocuous growth expected from round-off level differences. Here, we apply two modern multivariate two sample equality of distribution tests to evaluate statistical reproducibility of global climate model simulations using standard monthly output of short (~ 1-year) simulation ensembles of a control model and a modified model of US Department of Energy's Energy Exascale Earth System Model (E3SM). Both the tests are able to identify changes induced by modifications to some model tuning parameters. We also conduct formal power analyses of the tests by applying them on designed suites of short simulation ensembles each with an increasingly different climate from the control ensemble. The results are compared against those from another such test. These power analyses provide a framework to quantify the degree of differences that can be detected confidently by the tests for a given ensemble size (sample size). This will allow model developers using the tests to make an informed decision when accepting/rejecting an unintentional non-bit-for-bit change to the model solution.