There’s a good reason research institutions keep pushing for faster supercomputers: They allow the researchers to develop more realistic simulations than slower machines. This is indispensable for scientists and engineers striving to understand the workings of the universe or to create powerful new technologies.
ORNL’s next step in this endeavor will be Summit, a system that will rely on traditional central processing units, which have been running computers for decades, combined with graphics processing units, which were created more recently to accelerate video processing.
Summit is being built by IBM and powered by two as-yet-unreleased processors: IBM’s own Power9 CPUs and Volta GPUs from NVIDIA. It is part of a three-supercomputer acquisition coordinated by DOE among ORNL, Lawrence Livermore National Laboratory in California and Argonne National Laboratory in Illinois.
At a research institution such as ORNL, supercomputers are used to accurately simulate the physical world, but the physical world is a complicated place. To do their job well, these machines must process a lot of information on vastly different scales, from subatomic particles moving unbelievably fast to galaxies nearly as old as the universe. As a result, researchers need more powerful computers.
Progress has been impressive. The world’s most powerful computer today is more than 400 times faster than the top machine a decade ago, and that system was more than 400 times faster than the top machine a decade before. ORNL is already home to the world’s second fastest supercomputer, Titan (see “Titan has a very good year,” page 7), which can chew through as many as 27 million billion calculations each second—or 27 petaflops—but there are many pressing questions that not even Titan can answer.
Summit, which will be available to users in 2018, will be another five to 10 times more powerful than Titan. That means it will have a peak performance somewhere between 150 and 300 petaflops. For the
Oak Ridge Leadership Computing Facility, however, that’s not really the point.
“We really don’t like to talk about it in terms of peak performance, because peak performance really is not the most important thing,” said OLCF Project Director Buddy Bland. “Peak performance is just the biggest number on your speedometer. It doesn’t say how fast your car will really go. We selected the system based on its predicted performance on full applications.”
Getting ready for Summit
The drive to prepare applications for Summit will begin with the Center for Accelerated Application Readiness (see “Early Summit projects,” page 7). CAAR will include 13 advanced applications chosen for their potential to produce scientific breakthroughs on the new machine. Application developers will work with teams that include experts from the OLCF, IBM and NVIDIA. Their goal will be both to optimize the applications for use on Summit and to develop best practices for developers preparing to run on the machine.
Their job will be made a little easier by the architecture OLCF officials picked for Summit. In choosing a system containing CPUs and GPUs, they are following the same path they started down with Titan. Summit will have fewer nodes (3,400, compared to 18,600 on Titan), and the nodes will be much more powerful. It will be able to perform many more calculations simultaneously, meaning developers will have to divide computing jobs into more and smaller pieces in a process known as parallelism.
A familiar process
Still, the process is fundamentally similar.
“For users who are running effectively on Titan and taking advantage of the GPUs, I think moving to Summit will be relatively easy,” Bland said. “But it’s only relatively easy. You still have to find more parallelism in the code.”