The U.S. Air Force and the Department of Energy’s Oak Ridge National Laboratory launched a new high-performance weather forecasting computer system that will provide a platform for some of the most advanced weather modeling in the world.
Procured and managed by ORNL’s National Center for Computational Sciences, the system comprises two Hewlett Packard Enterprise, or HPE, Cray EX supercomputers and will primarily support work by the Air Force Weather, which provides the U.S. Army and Air Force with global and regional numerical weather model outputs for planning and executing missions worldwide.
Based at Offutt Air Force Base in Nebraska, the Air Force Weather Wing traces its heritage back to a meteorological service unit in the Army Signal Corps during World War I, and then officially with the formation of the Army Air Force’s Weather Wing in 1943. The two supercomputers have been dubbed “Fawbush” and “Miller” after Air Force meteorologists Maj. Ernest Fawbush and Capt. Robert Miller who made the first operational tornado forecast in history at Tinker Air Force Base in 1948.
“Now we have achieved a major milestone in our partnership with Oak Ridge National Laboratory,” said Col. Gary Kubat, acting Air Force Weather director. “The delivery of Fawbush and Miller represent a seminal moment in the evolution of Air Force Weather analysis and forecast capabilities.”
“Just as Maj. Fawbush and Capt. Miller drove a revolutionary change in military and public weather forecasting, the two halves of this high-performance computing system will open doors to critical new capabilities. We are now able to provide substantially higher fidelity global and regional forecasts to meet the operational needs of the Air Force, Space Force, Army and our other supported customers, while also having a much greater modeling and machine learning development environment to fast-track innovations into operations,” Kubat added.
The system’s new levels of performance will immediately enable Air Force weather researchers to run their current simulations at a much higher resolution, going from 17 kilometers between model grid points to 10 kilometers, resulting in more precise forecasts. Such weather predictions are vital to the success of military missions around the world.
“Looking to the future it is imperative that we take this step to ensure continued U.S. and allied dominance relative to our strategic competitors,” Kubat said. “This system has the growth potential to continue to meet emerging warfighter needs while enabling Air Force Weather to stay in lock-step with our modeling partners as we together develop the next generation weather models.”
The HPE Cray EX supercomputer architecture is an entirely new design aimed at handling a variety of supercomputing tasks, including modeling, simulation, AI and analytics workloads. The Air Force Weather forecasting system marks the first installation of the HPE Cray EX supercomputer in a federal facility. Using 2nd Gen AMD EPYC™ processors in 800 nodes across four cabinets, per system, Fawbush and Miller are capable of a peak performance of 7.2 petaflops, which is one quadrillion floating-point operations per second, or at least 6.5 times the performance of Air Force Weather’s current computer system. Also, each system can be expanded to a total of 1,024 nodes, allowing for the future installation of GPUs that can accelerate its floating-point performance tenfold.
While the multi-purpose capabilities of the HPE Cray EX supercomputer architecture will help further Air Force Weather’s forecasting capabilities, ORNL’s unique expertise in high-performance computing operations will provide unflagging support for these mission-critical machines.
Jim Rogers, ORNL’s computing and facilities director, said great care went into ensuring the system will always remain online. Each computer has its own unique power source with a dedicated power line.
“We spent a significant amount of time on the design side working on the resiliency of this machine – the facilities, the power, the cooling and the system design – so that we can always be available,” Rogers said. “So, regardless of what’s going on, whether we have a scheduled or unscheduled outage, whatever the cause, there will always be more than enough computing capability available for them to get their work done.”
Dustin Leverman, head of ORNL’s HPC Storage and Archive Group, will serve as program manager for the new system and notes that it will feature new failsafe features through the Slurm resource scheduler.
“The HPC11 system has two instantiations of compute resources and storage resources that can be used as failover for each other. This will allow for much higher system uptime because, in the event that a compute or storage resource needs hardware or software maintenance, we should be able to fail over to the other system,” Leverman said.
The Air Force’s workhorse for numerical weather prediction modeling is the Global Air Land Weather Exploitation Model, which is based on the United Kingdom’s Meteorological Office’s Unified Model. However, Air Force Weather is also developing its own specialized models. Enabled by Fawbush and Miller’s computational power – and the potential of even more speed via GPUs – the Air Force is looking to introduce all-new forecasting capabilities in the next several years, which will involve collaborative research with ORNL’s Computational Earth Sciences Group.
One example is full physics cloud forecasting, which uses cloud microphysical parameters to improve short- to medium-term forecasts that go beyond the capabilities of standard statistical regression models.
“We’re interested in clouds like nobody else – most people just care if it’s going to be cloudy or sunny,” said Frank Ruggiero, lead engineer of the Air Force’s Numerical Weather Modeling Program. “But a lot of our missions are very dependent on clouds – if an area is cloud-covered, then you have to address it a different way. If you’re doing any sort of remote sensing, clouds are going to make a big difference.”
Another specialized model that Air Force Weather will introduce on the new system is a global hydrology model to forecast stream flow, flooding or inundation – how much area of land is going to be under water and at what depth. This will involve accurately mapping and running calculations on hundreds of watersheds.
Ruggiero adds that another advantage to using dual systems, beyond redundancy, will be dynamic load balancing. They plan to run production on both machines and then use available capacity on the systems for research.
“That gives us a pretty good amount of flexibility and performance enhancement by spreading the load out that way. It will allow us to surge operations if we have to. We won’t be constrained by a physical partition,” Ruggiero said. “It’s truly a game-changing capability.”
UT-Battelle manages Oak Ridge National Laboratory for DOE’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science. – Coury Turczyn