Some computational scientists’ high-performance computing jobs are getting done even while they are not working, thanks in part to networking. Networks, particularly high-speed networks, allow supercomputer nodes to “talk” to each other, send messages to other nodes asking for data, and transmit large data files across the country. In addition, networks allow computational scientists to keep tabs on the progress of long-running jobs that can often run for days at a time.
DOE’s Center for Computational Sciences (CCS) at ORNL has supercomputers, as do the National Science Foundation center in Pittsburgh, the National Center for Supercomputing Applications facility at San Diego, and DOE’s National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory (LBNL) in California. But, according to Bill Wing of ORNL’s Computer Science and Mathematics Division, the similarities stop there.
“Other supercomputer centers slice up their resources on a fine scale and run hundreds of jobs for thousands of users,” he says. “We are different. We focus our computational resources on a few high-end users who need massive computing capacity for climate prediction, human genome analysis, and materials science simulations. Our customer base, which includes many users out-side ORNL, has different needs, including different network needs, so we use a different model.”
At CCS the computational scientists modeling future climate or exploding stars or searching for genes in DNA sequences run jobs for days or weeks at a time and generate huge files of calculated results that are transmitted between ORNL and NERSC’s data archives. Sometimes climate modeling can produce a run of data amounting to 1 trillion byes (1 terabyte). These data are sent between ORNL and NERSC in chunks of 250 million bytes (megabytes).
"We focus on moving large files of data," Wing says. "ORNL and LBNL are writing computer programs to ensure that these data packages slide through the network rather than clog it. In addition, we are developing the ability to allow users to monitor the progress of these long-running jobsand steer them if necessaryfrom a variety of portable access points, including laptops and personal digital assistants like Palm Pilots or iPAQs."
Data are sent over the network mostly using the transmission control protocol (TCP), a predefined protocol that computers use to communicate over a network. LBNL and ORNL researchers are devising ways to improve the ability to send large files so that supercomputers are not idle because of delays in data delivery.
To reduce delays in data delivery, Nageswara Rao of ORNL’s Computer Science and Mathematics Division has developed a computer program called NetLet that is being tested on 12 free telnet and university sites serving as monitors and routers. “NetLet allows computers to efficiently talk with each other, ‘predict’ the delay in getting the message to the receiver, and suitably route the message,” Rao says. “This algorithm enables the computers to measure connection speeds and the delays of pathways and then identify the best combination of pathways to get the information delivered efficiently in the time or at the rate guaranteed.”
Demonstrations of NetLet have shown that the algorithm has improved the speed of data delivery by about 40% without any additional support from the Internet routers. “Some of our data files used to take 10 seconds to get from our computer to a destination computer,” Rao says. “Those same data files can now get there in 6 seconds. That means that a huge data file that took 10 hours to arrive at a destination computer can now get there in 6 hours.”
The data files transmitted from ORNL's Eagle (IBM RS/6000 SP supercomputer) to the NERSC data archive fly over DOE’s Energy Sciences Network (ESnet), a semiprivate part of the Internet. Currently, DOE facilities such as ORNL and LBNL are using the new ESnet (OC12), a high-speed link operated by Qwest that supports data transmission at 622 megabits per second4 times faster than the old ESnet.
Related Web sites
Web site provided by Oak Ridge National Laboratory's Communications and External Relations