Scientific applications—computer codes that enable supercomputers to run calculations on huge amounts of data to solve complex problems—will help researchers make important scientific discoveries more quickly if the codes are run on more powerful unclassified supercomputers than exist today. The goal of the Department of Energy is to increase by a factor of 100 the computing capability available to support open scientific research. Such leadership-class computing would reduce the time required to simulate complex systems, such as future climate, from years to days.
The future technologies team at DOE's Center for Computational Sciences at ORNL plays an important role in the development of leadership-class computing. By investigating the continuously evolving core technologies critical to leadership-class systems, the future technologies team led by Jeffrey Vetter is identifying technologies that satisfy the performance requirements of DOE applications. Furthermore, with their intimate knowledge of the applications, software, and hardware, this team works cooperatively with vendors to ensure that the next generation of computing technology meets the requirements of DOE mission applications. Initially, the future technologies team is focusing on four application areas: biology, climate, fusion, and nanoscience.
In collaboration with universities and other government labs, the future technologies team evaluates new computer architectures, tracks and helps design future architectures, gathers contemporary application-driven systems requirements, develops performance prediction capabilities for leadership-class computing, and assesses the state of software for these systems.
Evaluating New Architectures
architectures are evolving at a rapid rate. The future technologies
team studies these new architectures and evaluates the suitability
of each architecture to important ORNL applications. In particular,
the overall goal of this strategy is to identify each system's architectural
strengths and weaknesses in the context of these important applications.
Recently, CCS evaluated several systems, including the Cray X1, the SGI Altix 3000, and the IBM Power4 cluster. In the short term, the future technologies team will evaluate the Cray Red Storm system, the IBM Blue Gene/L system, and the IBM Federation interconnect. Evaluations on the horizon will focus on the Cray X2, the IBM Power5, and the DARPA HPCS systems from Cray, IBM, and Sun Microsystems. In addition, the future technologies team is evaluating individual core technologies for leadership-class computing including processors, interconnects, memory subsystems, storage subsystems, system software, and programming models. Core technologies that the team is considering include the IBM's Federation interconnect, Infiniband interconnect, and processors for reconfigurable computing.
Simply put, CCS constantly surveys the computing landscape for future architectures that may offer orders-of-magnitude improvements in leadership-class computing. Thus, the future technologies team continues to track and evaluate evolving technologies such as system-on-a-chip (SOC), processor-in-memory (PIM), simultaneous multithreading, smart networks, optical interconnects, reconfigurable computers, and streaming supercomputers. In some cases, these technologies could provide CCS users with tremendous improvements in their scientific productivity because of higher performance, lower costs of ownership, and increased reliability and availability. Using evidence gathered by the future technologies team, CCS makes informed decisions about which speculative architectures to endorse, fund, and procure.
Application-Driven System Requirements
In addition to its evaluation activities, the future technologies team maintains an ongoing, contemporary set of application-driven systems requirements to use for procurements and for feedback to architects. In conjunction with the user communities, the group has identified major classes of scientific applications likely to dominate leadership-class system usage over the next five to ten years. Within each area, the CCS team investigates the machine properties (e.g., floating-point performance, memory, interconnect performance, input/output capability, and mass storage capacity) needed to enable major progress in each application class. This information helps CCS address major hardware, software, and algorithmic challenges to the effective use of leader-ship-class computing systems.
Generating application-driven systems requirements and performing evaluations on existing systems provide some insight into which architectures best match CCS workloads. However, the best possibility for furthering the understanding of performance phenomena and for assisting in intelligent procurement selections may lie in the technique of performance modeling. By having core competencies in the future technologies team for modeling, measurement, and simulation of computer systems themselves, CCS has at its disposal not only performance information about existing systems, but also the capability to speculate about future architectures. With this information, it is possible to predict the performance achieved by a future system even though that system is much larger than systems available today.
Software infrastructure plays a critical role in reducing the time needed for computational scientists to solve a problem across all phases of application development and use. Simply addressing algorithms and architectures is insufficient to ensure success on a leadership-class computer. First, software programming models and design must efficiently harness the underlying computer hardware to ensure reasonable execution performance. Second, software will play an important role in the reliability and availability of these new architectures. Applications and system software must adapt to and overcome faults in the underlying hardware. Third, scientists' productivity is often at odds with current software development techniques. In particular, the current methods of constructing, optimizing, and using massively parallel scientific applications can be difficult, tedious, and inefficient.
The future technologies team in CCS is in a critical position to evaluate and experiment with alternative technologies on leadership-class computers. Scalable, reliable system software is vital to the operation of a leadership-class platform. This software will provide baseline services such as operating systems, file systems, job and task scheduling, communication (MPI) libraries, resource management, configuration management, security, programming model support, and fault management. Given the possible scale of new systems that will have thousands to millions of processors, it is imperative that these services be efficient, scalable, and reliable. Because the requirements for scientific computing on this scale differ drastically from the requirements for commodity operating systems, the future technologies team is evaluating software environments to judge the scalability and reliability of system software and the associated programming environments and tools. Using this information, the team maintains a list of critical software requirements and undertakes research to solve these problems, in collaboration with universities, vendors, and other DOE national laboratories.
Ultimately, a goal of the future technologies team is to ensure that users can fully capitalize on leadership-class hardware and software to boost scientific productivity in meeting DOE missions. In this role of pathfinder for CCS, the future technologies team explores new technologies and interacts with users, researchers, and vendors to drive promising technologies forward.
Web site provided by Oak Ridge National Laboratory's Communications and External Relations