NERSCPowering Scientific Discovery for 50 Years

Up to Speed

SCIENTISTS, VENDORS MEET TO DEFINE OBSTA- CLES, SOLUTIONS IN DESIGNING, DEPLOYING PETASCALE SYSTEMS

May 1, 2007

A DOE workshop hosted by NERSC  brought together the international supercomputing community in May to identify  challenges for deploying petascale systems,  a collaboration that resulted in a series of  recommendations. 

The two-day meeting in San Francisco  attracted about 70 participants from roughly  30 supercomputer centers, vendors, other  research institutions and DOE program  managers from Advanced Scientific  Computing Research (ASCR), which funded  the workshop. The discussions covered a  wide-range of topics, including facility  requirements, integration technologies, performance assessment and problem detection and management.  

While the attendees were experienced in  developing and managing supercomputers, they recognized that the leap to petascale  computing will require more creative  approaches.  

“We are going through a learning curve.  There is fundamental research to be done  because of the change in technology and  scale,” said Bill Kramer, NERSC’s General  Manager who led the workshop.

The move to petascale computing will also drive changes in data flow, as Patricia Kovatch of the San Diego Supercomputer outlined at the recent workshop on deploying next-generation systems.

 

In his welcoming remarks, Dan  Hitchcock, Acting Director of the Facilities  Division within ASCR, urged more collaboration, noting that various stakeholders in the  high performance computing community  have historically worked independently to  solve thorny integration problems.  

Mark Seager from Lawrence Livermore  National Laboratory and Tom Bettge from  the National Center for Atmospheric  Research (NCAR) helped jumpstart the  workshop by sharing their experiences with  designing state-of-the-art computer rooms  and deploying their most powerful systems.  

Both Seager and Bettge said having  enough power to supply massive supercomputers will become a bigger headache. A  search for more space and reliable power  supply led NCAR to Wyoming, where it will  partner with the state of Wyoming and the  University of Wyoming to build a $60-million  computer center in Cheyenne. 

Seager also advocated the creation of a  risk management plan to anticipate the  worst-case scenario.  “I would argue that the mantra is ‘maximizing your flexibility,’” Seager said.  “Integration is all about making lemonade  out of lemons. You need a highly specialized customer support organization, espe- cially during integration.” 

Six breakout sessions took place over  the two days to hone in on specific issues,  such as the best methods for performance  testing and the roles of vendors, supercomputer centers and users in ensuring the systems continue to run well after deployment.  The workshop program also included a  panel of vendors offering their views on  deployment challenges. The speakers, representing IBM, Sun Microsystems, Cray and  Linux Networx, discussed constraints they  face, such as balancing the need to invest  heavily in research and development with  the pressure to make profit.   A second panel of supercomputer center  managers proffered their perspectives on  major hurdles to overcome. For example,  Patricia Kovatch from the San Diego  Supercomputer Center hypothesized the exponential growth in data could cost  her center $100 million for petabyte tapes.  Currently the center spends about $1 million a year on tapes.  

“We feel that computing will be driven by  memory, not CPU. The tape cost is a bigger  problem than even power,” Kovatch told the  audience. 

Memory is clearly one of many challenges. Leaders from each breakout sessions presented slides detailing the daunting tasks ahead, including software development, acceptance testing and risk management.  

At the end of the workshop, Kramer led  the discussion to prioritize major challenges  that emerged from the breakout sessions  and enlisted the audience’s input to define  strategies for tackling those problems. A  report detailing the discussions and recommendations will be issued.  

The workshop had moments of levity.  During dinner, attendees shared humorous that tales about blunders they made or witnessed in their line work. The stories included a machine room design that left only a  3-inch headroom for computer cables; a  cheapskate landlord who shut off power to  the cooling system at night and weekends  to save money; and a comment from a for- mer U.S. president: “Oh Seymour Cray, I’ve  heard of him. Doesn’t he work for IBM?”   Learn more about the discussions and  see the slides from the workshop at  http://www.nersc.gov/projects/HPC- Integration.


About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, NERSC serves almost 10,000 scientists at national laboratories and universities researching a wide range of problems in climate, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. Department of Energy. »Learn more about computing sciences at Berkeley Lab.