Up to Speed
SCIENTISTS, VENDORS MEET TO DEFINE OBSTA- CLES, SOLUTIONS IN DESIGNING, DEPLOYING PETASCALE SYSTEMS
May 1, 2007
A DOE workshop hosted by NERSC brought together the international supercomputing community in May to identify challenges for deploying petascale systems, a collaboration that resulted in a series of recommendations.
The two-day meeting in San Francisco attracted about 70 participants from roughly 30 supercomputer centers, vendors, other research institutions and DOE program managers from Advanced Scientific Computing Research (ASCR), which funded the workshop. The discussions covered a wide-range of topics, including facility requirements, integration technologies, performance assessment and problem detection and management.
While the attendees were experienced in developing and managing supercomputers, they recognized that the leap to petascale computing will require more creative approaches.
“We are going through a learning curve. There is fundamental research to be done because of the change in technology and scale,” said Bill Kramer, NERSC’s General Manager who led the workshop.
In his welcoming remarks, Dan Hitchcock, Acting Director of the Facilities Division within ASCR, urged more collaboration, noting that various stakeholders in the high performance computing community have historically worked independently to solve thorny integration problems.
Mark Seager from Lawrence Livermore National Laboratory and Tom Bettge from the National Center for Atmospheric Research (NCAR) helped jumpstart the workshop by sharing their experiences with designing state-of-the-art computer rooms and deploying their most powerful systems.
Both Seager and Bettge said having enough power to supply massive supercomputers will become a bigger headache. A search for more space and reliable power supply led NCAR to Wyoming, where it will partner with the state of Wyoming and the University of Wyoming to build a $60-million computer center in Cheyenne.
Seager also advocated the creation of a risk management plan to anticipate the worst-case scenario. “I would argue that the mantra is ‘maximizing your flexibility,’” Seager said. “Integration is all about making lemonade out of lemons. You need a highly specialized customer support organization, espe- cially during integration.”
Six breakout sessions took place over the two days to hone in on specific issues, such as the best methods for performance testing and the roles of vendors, supercomputer centers and users in ensuring the systems continue to run well after deployment. The workshop program also included a panel of vendors offering their views on deployment challenges. The speakers, representing IBM, Sun Microsystems, Cray and Linux Networx, discussed constraints they face, such as balancing the need to invest heavily in research and development with the pressure to make profit. A second panel of supercomputer center managers proffered their perspectives on major hurdles to overcome. For example, Patricia Kovatch from the San Diego Supercomputer Center hypothesized the exponential growth in data could cost her center $100 million for petabyte tapes. Currently the center spends about $1 million a year on tapes.
“We feel that computing will be driven by memory, not CPU. The tape cost is a bigger problem than even power,” Kovatch told the audience.
Memory is clearly one of many challenges. Leaders from each breakout sessions presented slides detailing the daunting tasks ahead, including software development, acceptance testing and risk management.
At the end of the workshop, Kramer led the discussion to prioritize major challenges that emerged from the breakout sessions and enlisted the audience’s input to define strategies for tackling those problems. A report detailing the discussions and recommendations will be issued.
The workshop had moments of levity. During dinner, attendees shared humorous that tales about blunders they made or witnessed in their line work. The stories included a machine room design that left only a 3-inch headroom for computer cables; a cheapskate landlord who shut off power to the cooling system at night and weekends to save money; and a comment from a for- mer U.S. president: “Oh Seymour Cray, I’ve heard of him. Doesn’t he work for IBM?” Learn more about the discussions and see the slides from the workshop at http://www.nersc.gov/projects/HPC- Integration.
About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, NERSC serves almost 10,000 scientists at national laboratories and universities researching a wide range of problems in climate, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. Department of Energy. »Learn more about computing sciences at Berkeley Lab.