NERSC Hosts Workshop About the Dawn of Exascale Storage
July 31, 2009
This month, the Department of Energy's (DOE) National Energy Research Scientific Computing Center (NERSC) hosted the first workshop that discussed strategies for managing and storing the influx of new archival data that will be produced in the exascale era, when supercomputers will be capable of achieving quintillions (1,000 quadrillion) of calculations per second. Experts predict that DOE's first exascale supercomputer for scientific research will be deployed in 2018.
"Most of the extreme scale workshops to date primarily focus on the high-level challenges. Our workshop is the first to address the details of getting data in and out of the exascale machine and storing the enormous amount of data that will be produced by the new technology,"
- Jason Hick, NERSC’s Mass Storage Group
“I think it's a real honor that DOE asked NERSC to lead the workshop on mass storage,” says Jason Hick, who heads NERSC's Mass Storage Group. “Most of the extreme scale workshops to date primarily focus on the high-level challenges. Our workshop is the first to address the details of getting data in and out of the exascale machine and storing the enormous amount of data that will be produced by the new technology.”
Last year the world witnessed the dawn of petascalesupercomputing when two DOE machines achieved petaflops/s performance, carrying out quadrillions (1,000 trillion) of floating point calculations per second. From climate research to nuclear physics, the new capabilities are already enabling scientists to tackle problems that were previously impossible. Despite these successes, the dawn of petascale also shed light on the limitations of various data technologies, including storage software and hardware. Experts expect these issues to be magnified when exascale computers that perform 1,000 more calculations per second than today's fastest supercomputer come online. The workshop was an effort by DOE's Office of Science to preempt future problems by getting experts to identify challenges now and begin thinking about solutions.
According to Hick, one of the primary goals of the July workshop was to develop a strategy for determining whether the High Performance Storage System (HPSS) software would meet the demands of exascale computing. HPSS currently manages petabytes of data on disk and in robotic tape libraries, and is running on most DOE mass storage systems across the country. With this workshop, the DOE is very interested in ensuring the current archival storage system is positioned to meet the demands of extreme scale storage.
HPSS was developed in a collaboration that began more than a decade ago between IBM and five DOE national laboratories including Lawrence Berkeley National Laboratory, which hosts NERSC, Lawrence Livermore National Laboratory, Los Alamos National Laboratory, Oak Ridge National Laboratory and Sandia National Laboratory. Representatives from all of these organizations attended the workshop and contributed to discussions about how to modify HPSS for extreme scale era computing. Other participants include the Argonne National Laboratory and the Chief Technology Officer of Instrumental Inc.
“Disk performance is increasing at about 5 percent a year; this is impacting the performance and size of file systems available to science researchers. It further impacts the feasibility in managing, whether it be analyzing or archiving, the data. This trend is one example of a problem that storage has to deal with in order to realize science researchers' desire to improve on their science productivity in the extreme scale era,” says Hick. “These workshops give the DOE supercomputing facilities the ability to discuss these sorts of issues and propose solutions.”
A 51-page white paper that contained input from DOE labs currently using HPSS, industry storage hardware trends and an independent hierarchical storage management software market survey was put together by workshop participants as a foundation for the workshop's discussion.
“When the HPSS collaboration began in 1992 to develop a scalable high performance storage system, HPC power was rated in tens of gigaflops and within a couple of years will be at tens of petaflops; storage capacity was at tens of terabytes and HPSS is now handling tens of petabytes, expanding shortly to hundreds of petabytes; instantaneous throughput was in megabytes per second and HPSS is now handling gigabytes per second and will scale to hundreds of gigabytes per second,” says Dick Watson, of the Lawrence Livermore National Laboratory, who was one of the founders of the HPSS collaboration.
“We think that HPSS with its many scalable fundamental architectural features and current scaling plans will be able to handle the requirements of the exascale era discussed at this workshop," he adds.
About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, NERSC serves almost 10,000 scientists at national laboratories and universities researching a wide range of problems in climate, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. Department of Energy. »Learn more about computing sciences at Berkeley Lab.