FOR COMPUTATIONAL SCIENCES:
The Department of Energy's Center for Computational Sciences at Oak Ridge National Laboratory is housed in the nation's largest unclassified computing facility. The new facility, which is part of ORNL's $300 million campus modernization project, is a 170,000 sq. ft building that includes 40,000 sq. ft for DOE's unclassified scientific computing resources. The spacious new building houses a state-of-the-art computer center and more than 400 staff, including researchers and computational scientists. The facility provides the networking, visualization, data storage capabilities, and electrical and mechanical infrastructure required to support a leadership-class scientific computing system.
Networking the Nation
CCS is the hub for ORNL's fast network connections. CCS has several major networks in place and under construction, enabling it to connect with DOE laboratories, university partners, and other supercomputing centers. Currently, two networks are operating. CCS links with other DOE facilities through the Energy Sciences Network (ESNet), which has an OC-12 connection-that is, a data transfer speed of 622 megabits per second (mb/s). DOE plans to upgrade ESNet to OC-48, or 2.5 gigabits per second (Gbs).
CCS also connects to the university user community through the National Science Foundation (NSF) Internet-2 network, which has a data transfer speed of OC-192, or 10 Gbs. The high-speed network connection was funded by ORNL as an integral part of its strategy to work more closely with a variety of university partners. CCS staff are building four new networks, including the TeraGrid and the UltraScience Net (see Networking: Linking America's Laboratories).
Providing the Power
The electrical and mechanical infrastructure of CCS is designed specifically for a leadership-class computing system. The computer center can accommodate the delivery of up to 12 megawatts (MW) of electrical power to the computers and building infrastructure. Initially 4 MW was provided to the computers and an additional 4 MW was made available for the remainder of the building. The building is programmed for expansion with the pad space and conduit in place to easily ramp up to to 8 MW for the computer systems, bringing the CCS total to 12 MW.
The Tennessee Valley Authority's electrical infrastructure provides very reliable power to the CCS. However, CCS also provides a 500-kilowatt (KW) uninterruptible power system that is backed up by a 750-KW diesel generator. The back-up power is insufficient to supply the large computer systems, but is more than enough to power all the networks, disk drives, storage systems, and server infrastructure that support the CCS.
To remove the heat generated by all this power, the facility has three 1200-ton chillers to provide cold water used to cool the computer systems. The chillers are fully redundant so that if any one fails, the other two can carry the load of the building. The cooling infrastructure is designed to allow easy expansion of the capacity without having to shut down the operating computer center.
Evaluating Machine Performance
In a recent review by DOE and independent panels, the review committee referred to the CCS as the "pathfinder for the nation" in scientific computing. The CCS is the nation's only unclassified center that has deployed large-scale supercomputer systems representing the three major computer architectures. By having these three systems available in one place, the CCS is in a unique position to help computational scientists select the best architecture for each type of scientific application. Scientific applications—computer codes that enable supercomputers to run calculations on huge amounts of data to make scientific discoveries—will run faster or more efficiently on one computer architecture than on other architectures. Because CCS is an evaluation center, researchers test different codes to determine the best architecture for each code. The results guide both hardware vendors and software developers in improving current and next-generation systems.
The Cray X1 system, which has a scalable vector architecture, is the newest system being evaluated by the CCS evaluation team and users. The Cray X1 uses custom-designed vector processors (chips) to get high performance for scientific codes. The Cray designed processors fabricated by IBM are linked together using a high-performance shared memory interconnect technology. In other words, all the processors share the same memory where they fetch and deposit data. The machine, named Phoenix, has 512 multistreaming processors, each of which can carry out as many as 12.8 billion operations per second, making the performance of the total system as high as 6.4 trillion operations per second. Early evaluation results from the Cray X1 system reveal performance ranges from 5 to 25 times faster than the same number of processors on the IBM Power4 system for applications that can take advantage of the vector processors.
The second supercomputer architecture at CCS is the cluster of symmetric multi-processor systems. These commodity off-the-shelf computers are linked together using a commodity or proprietary interconnect technology across which scientific applications are distributed. In the CCS, this architecture is represented by the IBMPower4 (Cheetah) and its predecessor, the IBMPower3 (Eagle). The IBMPower4 is capable of 4.5 trillion operations per second. The IBMPower3 is capable of 1 trillion operations per second. These supercomputers use IBM's custom processor chips. Both of these systems are stable and provide highly reliable supercomputing capability for many projects within DOE.
In 2003 these cluster-based IBM machines were the primary production resource for DOE's Scientific Discovery through Advanced Computing (SciDAC) program. Some 31 million MPP processor hours of IBM machine time were delivered to CCS users in DOE's SciDAC program in the past year. Researchers utilized more than 60 million MPP processor hours of IBM machine time for all DOE science applications.
members are evaluating IBM's next-generation interconnect technology,
which is the method used to link processors together. IBM Power3 employs
the first-generation technology called "SP switch" and
the IBM Power4 supercomputer uses the second-generation Colony technology.
The third supercomputer architecture in the CCS is the shared memory system using commodity microprocessors. The SGI Altix, which has 256 Intel Itanium2™ processors, can expand up to 512 processors. The most remarkable feature of this computer, called Ram, is its memory, which holds 2 trillion bytes (2 terabytes) of data. By contrast, the Cray X1 and the IBM systems each have only about 1 terabyte of data in memory. The first supercomputer in Oak Ridge, the Cray XMP installed in 1985, had one-millionth the memory of the SGI Altix.
The shared-memory programming model of the SGI Altix has several advantages. For example, the model makes it conceptually easier for developers of scientific applications to design their codes to run on such a machine. The model also enables certain kinds of applications to run very efficiently. ORNL researchers are seeing excellent results on this system in computational chemistry, global climate modeling and computational biology.
Cray X1—A New Capability for Science
On August 15, 2002, DOE selected ORNL to test the effectiveness of the new Cray architecture in solving important scientific problems in climate, fusion, biology, nanoscale and magnetic materials, and astrophysics. Under the program, ORNL acquired the first few nodes of a Cray X1 supercomputer system.
The Cray X1 system installed at ORNL is the first U.S. computer to offer vector processing and massively parallel processing capabilities in a single architecture. The system is designed to be scaled up to provide scientific applications with performance greater than can be achieved at currently available U.S. computers. Phoenix has 256 multi-streaming vector processors (MSPs). Each MSP has 2 megabytes (MB) of cache memory. Four MSPs form a node that has 16 gigabytes (GB) of shared memory.
In 2002 CCS procured the first 32 processors of the Cray X1 to evaluate the processors, memory subsystem, scalability of the architecture, and software environment to determine the supercomputer's suitability for solving complex scientific problems. CCS and Cray have been evaluating the Phoenix processors to predict its expected sustained performance on key DOE applications codes. The results of the benchmarks show that the Phoenix's architecture is exceptionally fast for most operations and that it markedly improves the performance of several scientific applications.
For example, large-scale simulations of high-temperature superconductors run 25 times faster on the Phoenix system than on an IBM Power4 cluster using the same number of processors. A fusion application requires 16 times more processors on an IBM Power3 cluster to achieve the same performance as that of Phoenix.
The best performance of the parallel ocean-circulation program (POP v1.4.3) for climate simulations is 50% higher on Phoenix than on Japan's Earth Simulator and 5 times higher on Phoenix than on an IBM Power4 cluster. "Even at 256 processors," the evaluation report states, "the Cray X1 system is already outperforming other supercomputers with thousands of processors for certain class of applications such as climate modeling and some fusion applications."
An astrophysics simulation runs 15 times faster on Phoenix than on an IBM Power4 cluster using the same number of processors. A molecular dynamics simulation related to the phenomenon of photon echo—light from an exploding star scattered off shells of cosmic dust—runs 8 times faster on Phoenix than on other supercomputers.
The best results are shown on large problems, where researchers cannot fit the entire problem into the cache of the processors. These large problems are the types important to DOE and other government agencies and industries interested in ultrascale simulations. The increased performance enables some simulations to fully resolve questions raised by previous scientific studies.
Based on the extremely positive results, the plan for fiscal-year 2004 is to expand the initial Cray X1 system from 8 cabinets (256 processors) to 10 to 12 cabinets with 640 to 768 processors to develop and deploy the scalable operating system, file system, and other critical features. Along with the deployment of the Cray X1, the CCS staff is partnering with Cray to develop their next generation of supercomputer code named "Black Widow." The system will be capable of 100+ teraflops, specifically designed for DOE science applications. If the financing for this advanced system is committed by a combination of government and industry, this advanced supercomputer system could be deployed in fiscal-year 2006.
Visualizing and Storing Supercomputer Results
A key part of the new CCS facility is a state-of-the-art Science Visualization Facility for visualization of multi-terabyte datasets generated by simulation programs run on internal computer systems. The signature capability of this facility is the 35-million pixel, 30-ft.-wide display wall used for viewing high-resolution images. The display is similar to an IMAX movie screen for viewing scientific data. Such visualizations enable researchers to better understand and communicate the significance of the results of supercomputer calculations.
In addition to displaying data, ORNL helped to develop a state-of-the-art data storage system that has become the de facto standard for many of the largest supercomputing centers in the world. CCS has one of these data repositories, which is aptly named the High Performance Storage System (HPSS). The system received an R&D 100 award in 1995 from R&D magazine as one of 100 best innovations of that year.
The HPSS system at ORNL stores files of multiple terabytes for the users of the CCS computers. Other customers include NASA's Distributed Active Archive Center and the Atmospheric Radiation Measurement Program, the largest global change research program supported by DOE.
HPSS is extremely scaleable. The system can support data import rates and data objects of many different sizes. Archive files range in size from a few megabytes up to many terabytes. The CCS is continually upgrading the capability and capacity of HPSS by increasing the bandwidth, upgrading storage devices, and improving system availability.
With an acre of new space, advanced network capabilities, unlimited infrastructure, and state-of-the-art data storage systems. DOE's Center for Computational Sciences at ORNL is ready to assume the primary role in developing a leadership-class computing program for America 's scientific future.
Web site provided by Oak Ridge National Laboratory's Communications and External Relations