Moving to Titan
ORNL prepares for a leap in computing power
- Sidebar: Preparing users for Titan
Titan will combine two very different types of processors and will require users to adapt their applications accordingly.
- Sidebar: Science at 20 petaflops
ORNL's Center for Accelerated Application Readiness is working with six world-class applications that to show how Titan's combination of GPUs and CPUs can revolutionize computational science.
Titan's leap in power will result from the addition of thousands of graphics processing units. GPUs provide a major boost in computing power without a corresponding increase in the electricity required to run the system. Photo: Jason Richards
When Titan is available to users in the first half of 2013, it will have a peak performance of 20 petaflops, or 20,000 trillion calculations each second, making it more powerful than any computer now in existence and six or more times more powerful than ORNL's current Jaguar system.
This leap in power will come from a new approach to computing: the use of accelerators. Titan will use thousands of graphics processing units, or GPUs, which have been developed since the 1980s to improve the video performance of computers and the look and feel of computer gaming. The addition of GPUs in supercomputing provides a major boost in computing power without a corresponding boost in the electricity required to run the system.
Gordon Moore and computer speed
The introduction of accelerators is at least the third strategy the computing world has used to get more out of a system.
For decades, advances came from shrinking transistors to fit more onto a sliver of silicon. Chip-makers did an admirable job of this; indeed, when Intel co-founder Gordon Moore famously predicted in 1965 that the number of transistors on a processor would double every two years, he was looking forward only a decade. Clearly he underestimated the industry he helped to found.
Nevertheless, advances have slowed in recent years.
"Since 2004, microprocessors have not gotten any faster on single-core performance," notes Buddy Bland, project director for the Oak Ridge Leadership Computing Facility (OLCF). "But Moore's Law, which people interpret as saying performance will double every 24 months, is still alive.
"So how do you reconcile these two statements? Moore's Law has gone into adding more transistors, and those additional transistors have been used to add more cores. You get more performance on the processor through more parallelism, not through increased single-thread performance."
In home computers, this approach can be seen in dual-core or quad-core chips. In the largest supercomputers, it can be seen in systems that incorporate hundreds of thousands of computing cores. Jaguar, for instance, incorporates 18,688, 16-core processors to reach a total of 299,008 separate processor cores.
Titan has been developed with the recognition that even this level of parallelism has limits. Traditional computer chips, known as central processing units, are fast but power-hungry. If supercomputers are to move beyond the petascale to the next major milestone—exascale computers capable of a million trillion calculations per second—they must find a more energy-frugal way to do it.
"The traditional microprocessors are optimized to make single threads go really fast," Bland explains, "but at a cost of using more power per calculation than necessary. There are several ways being developed to get more parallelism without increasing the power consumption of the processors dramatically, and one of those ways is using GPGPUs, or general purpose graphics processing units."
Researchers moving from single-core to many-core computers had to break their calculations into smaller problems that could be parceled out separately to the different processing cores. This approach was known as parallel computing. Accelerators are pushing parallelism by allowing researchers to divide those smaller problems even further.
"These processors have many, many, many threads of execution," Bland explains. "Each one runs more slowly than a traditional CPU thread of execution, but you have so many of them that it allows you in aggregate to get much higher performance at a similar power consumption."
Upgrade in two phases
Titan is transitioning from Jaguar in two phases.
The first was completed in February 2012. In it, Jaguar's AMD Opteron processors were upgraded to the company's newest 6200 series, and the number of processing cores was increased by a third, from 224,256 to 299,008. In the process, two six-core processors were removed from each of Jaguar's 18,688 nodes, or connection points. At the same time, the system's interconnect was updated and its memory was doubled to 600 terabytes.
In addition, 960 of Jaguar's 18,688 nodes received an NVIDIA GPU. This portion of the system, known as TitanDev, gives OLCF staff and selected researchers a platform for testing approaches to GPU computing (see sidebar: "Preparing users for Titan").
The upgrade increased Jaguar's peak performance from 2.3 to 3.3 petaflops.
The second phase of the upgrade will begin in October 2012. At that time the 960 accelerators will be removed, and most of Titan's nodes will get one of NVIDIA's next-generation Kepler GPU processors added to the existing AMD processor. Each of the Kepler chips will be capable of more than a trillion calculations each second, or 1 teraflop, and the new system as a whole will have a peak performance of at least 20 petaflops.
During this process the OLCF will also replace its file system. Bland says the target for the new system is to move a terabyte of data each second, making it 8,000 times as fast as a typical business Ethernet connection.
As the countdown to Titan winds down, facility staff are working overtime to ensure that the new system will be delivering valuable results from day one. TitanDev has proved important in this effort, giving the facility's staff scientists and its most sophisticated users a chance to try out approaches to the new architecture.
Bland notes that the first step—identifying more effective ways to divide large problems and spread them out among the available resources—would pay off even without the accelerators.
"We've done a lot of work to help those applications expose more parallelism. In doing that, we have typically doubled the performance of the application before you even use the accelerator, and when you add the accelerator we're typically seeing another doubling of performance," Bland says.
"When we get the next generation of GPUs in the fall, we expect it will be even faster." —Leo Williams