One of the world's premier computing facilities is transforming scientific research.
Many in the scientific community are surprised to learn that the world's most powerful supercomputer is not crunching numbers for classified government projects deep in the bowels of an ultra-secret agency. In fact, the computational heart of one of the Department of Energy's most successful user facilities is blazing through scientific simulations designed to help develop cleaner sources of energy and to understand the causes and impacts of climate change in Oak Ridge National Laboratory's Leadership Computing Facility (LCF).
The great majority of research conducted on this analytical juggernaut, known as Jaguar, is anything but secret. Practiced by researchers from universities, corporations and government laboratories, this "open science" philosophy is designed to provide a unique tool for addressing some of the largest and most important scientific challenges. The collection of capabilities that accompany this computing leviathan, including a breadth of scientific talent; an acclaimed support staff; and a formidable computing infrastructure of power, cooling and connectivity, has made Oak Ridge one of the world's premier computational facilities for the delivery of scientific research.
Despite the fact that in 2010 Jaguar will provide more than a billion processor hours of computing time, the competition by users for resources is intense. The majority of the time on Jaguar is allocated through INCITE, a program operated in conjunction with the Department of Energy's Office of Advanced Scientific Computing Research. One of the nation's most successful programs of computational research, INCITE selects from among user research proposals by evaluating both the potential and computational readiness of the research to accelerate scientific discoveries and technological innovations.
Research projects selected to run on Jaguar are often accelerated as a result of the massive amount of analysis conducted by the machine in a relatively brief period—in some cases, reducing from months to days the time necessary to generate data. LCF Director of Science Doug Kothe points to the Department of Education's list of the "Top 10 Scientific Achievements" over the last three years as evidence of the machine's value. "Five of those 10 achievements were the direct result of data enabled through simulations executed on Jaguar," he says.
Because of Jaguar's importance to advancing science in a number of disciplines, the supercomputer runs user experiments virtually non-stop every day of the year, relying on scheduling software to load simulations onto the system as quickly and efficiently as possible. Kothe notes that, "On any given day, the backlog of jobs waiting in the queue could be several days of simulation time. Thirty days of backlog would not be tolerated by the scientists. Likewise, a zero backlog would indicate the machine was underutilized," he says. "I am not sure what the ideal backlog would be, but there is no doubt this is a user facility achieving high availability, high utilization and high demand."
Although at 2.3 petaflops, or 2,300 trillion calculations per second, Jaguar is the world's most powerful computer by a wide margin, the system's most unique attribute may not be speed, but rather its 300 terabytes of memory—about three times that of any comparable supercomputer. Jaguar's abundance of memory enables the storage of more highly detailed models and equations necessary for simulating various real-world phenomena.
The advantage of this extra memory capacity is illustrated by high-resolution climate models developed by the Computational Climate End Station Project. Headed by climate scientist Warren Washington of the National Center for Atmospheric Research, the project typically develops models that attempt to predict climatic conditions in coming decades and centuries. Washington emphasizes that while the ability to develop ever more detailed models is helpful, high-resolution models are not an end in themselves. "Resolution is important, but we must also make sure our models produce realistic simulations. This translates to improving the details of areas that we could not treat as well in the past." Washington observes that earlier models could specify only general features, such as deserts, forests or grasslands. "With additional computational power, we can now specify species of plants and examine details like how precipitation over mountainous regions migrates into river valleys and eventually flows into the ocean," he says. "With the Oak Ridge resources, we can run our models at much higher resolution than in the past."
Time will tell
From this early vantage point, it's hard to say which user communities have benefited most from Jaguar's rapidly expanding capabilities. "To some extent the answer is in the eyes of the beholder," Kothe says. "We are employing an open science system for research in a number of areas as varied as chemistry, materials science, climate, biology and astrophysics. In each of these fields, we can point to impactful work made possible by Jaguar."
In computational science, as in other scientific disciplines, time is often required for the community at large to appreciate the impact of the work being done. Koethe rarely sees a simulation in progress that can be immediately viewed as "game changing." "The point is that time—measured in years, not weeks or months—is required to know which of these impacts will be significant. We are confident, however, that the impacts will be broad and deep," he says.
The incredible pace of change can make researchers forget that some simulations conducted on Jaguar today were unfathomable even a few months ago. For some of ORNL's users, these capabilities are opening up a new way of thinking about research. "The approach a scientist takes to designing a simulation that runs only once a week or once a year is very different than the approach he or she would take to a simulation run hundreds or thousands of times in a week," Kothe says. "The challenge is to open our minds to a more unconstrained approach to science. The sheer computing power of systems like Jaguar means that scientists are much less apt to limit the complexity of the models they construct. This ability to integrate a higher level of complexity leads to more predictive models that increase the accuracy of the simulated results."
A field of dreams
Just as computing capabilities rapidly change, so does the pecking order among the world's top computing systems. Although Jaguar is the world's top supercomputer today, the title is elusive in a field where ORNL's maximum computing capacity is 800 times greater than it was just 5 years ago. One constant over this period, however, has been the popularity of the LCF with the facility's users. When pressed about what consistently attracts users to Oak Ridge, Kothe suggests three factors. "First, Jaguar has become a field of dreams for the best scientists. To some extent one can use the cliché, ‘if you build it, they will come.'" Second, he emphasizes the ability of the center's staff to meet the understandably complex and often esoteric needs of LCF users. "The ORNL support staff is continually cited by users as being second to none. We have a unique set of experts who are willing and able to help."
Finally, Kothe attributes much of the center's success to a unique support model. "When we stood up the center several years ago, we made a conscious decision to support not hundreds of projects and thousands of users, but dozens of projects and hundreds of users. This decision enabled us to assign our best staff to individual projects. As a result, we are simply not answering mundane questions like, ‘Where do I get an account?' or ‘How do I run a job?' Our staff members function more like collaborative members of project teams."
"One indicator that the center's model is working is the emulation from other centers," Kothe says. "Our model has been quite effective at enabling Oak Ridge to support the Department of Energy's goal of ground-breaking computational research. For our DOE customer, and for the hundreds of users who each year take advantage of this remarkable program, the impact will be nothing less than profound."
Web site provided by Oak Ridge National Laboratory's Communications and External Relations