NERSC Global Filesystem Now Provides Seamless Data Access from All Systems
February 1, 2006
In February, NERSC deployed the NERSC Global Filesystem (NGF) into production, providing seamless data access from all of the Centers’ computational and analysis
resources. With NGF, users can now run applications on Seaborg, for example, then use DaVinci to visualize the data without having to explicitly move a single data file.
NGF is intended to facilitate sharing of data between users and/or machines. For example, if a project has multiple users who must all access a common set of data files, NGF will provide a common area for those files. Alternatively, when sharing data between machines, NGF eliminates the need to copy large datasets from one machine to another. For example, because NGF has a single unified namespace, a user can run a highly parallel simulation on Seaborg, followed by a serial or modestly parallel post-processing step on Jacquard, and then perform a data analysis or visualization step on DaVinci.
“NGF stitches all of our systems together. When you go from system to system, your data is just there,” said Greg Butler, leader of the NGF project. “Users don’t have to manually move their data or keep track of it. They can now see their data simultaneously and access the data simultaneously.”
NERSC staff began adding NGF to computing systems last October, starting with the DaVinci visualization cluster (an SGI Altix) and finishing with the Seaborg IBM SP system in December. To help test the system before it entered production, a number of NERSC users were given pre-production access to NGF.
Early users helped identify problems with NGF so they could be addressed before the filesystem was made available to the general user community.
“I have been using the NGF for some time now, and it's made my work a lot easier on the NERSC systems,” said Martin White, a physicist at Berkeley Lab. “I have at times accessed files on NGF from all three compute platforms (Seaborg, Jacquard and Bassi) semi-simultaneously.”
NGF also makes it easier for members of collaborative groups to access data, as well as ensure data consistency by eliminating multiple copies of critical data.
Christian Ott, a Ph.D. student and member of a team studying core-collapse supernovae, wrote that “the project directories make our collaboration much more efficient. We can now easily look at the output of the runs managed by other team members and monitor their progress etc. We are also sharing standard input data for our simulations.”
NERSC General Manager Bill Kramer said that as far as he knows, NGF is the first production global file system spanning five platforms — Seaborg, Bassi, Jacquard,
DaVinci and PDSF — three architectures and four different vendors. While other centers and distributed computing projects such as NSF’s TeraGrid may also have shared file systems, Butler said he thinks NGF is unique in its heterogeneity.
A heterogeneous approach for NGF is a key component of “Science-Driven Computing,” NERSC’s five-year plan (). This approach is important because NERSC typically procures a major, new computational system every three years, then operates it for five years to support DOE research. Consequently, NERSC operates in a heterogeneous environment with systems from multiple vendors, multiple platforms, different system architectures, and multiple operating systems. The deployed file system must operate in the same heterogeneous client environment throughout its lifetime.
Butler noted that the project, which is based on IBM's proven GPFS technology (in which NERSC was a research partner), started about five years ago. While the computing systems, storage and interconnects were mostly in place, deploying a shared file system among all the resources was a major step beyond a parallel file system. In addition to the different system architectures, there were also different operating systems to contend with. However, on Feb. 10, the last servers and storage were deployed. To keep everything running and ensure a “graceful shutdown” in the event of a power outage, a large uninterruptible power supply has been installed in the basement of the Oakland Scientific Facility.
While NGF is a significant change for NERSC users, it also “fundamentally changes the center in terms of our perspective,” Butler said. For example, when the staff needs to do maintenance on the file system, the various groups need to coordinate their efforts and take all the systems down at once.
Storage servers, accessing the consolidated storage using the shared-disk file systems, provide hierarchical storage management (HSM), backup, and archival services. The first phase of NGF is focused on function and not raw performance, but in order to be effective, NGF has to have performance comparable to native cluster file systems. The current capacity of NGF is approximately 70 TB of user-accessible storage, and 50 million inodes. Default project quotas are 1 TB and 250,000 inodes (the data structures for individual files). The system has a sustainable bandwidth of 3 GB/sec bandwidth for streaming I/O, although actual performance for user applications will depend on a variety of factors. Because NGF is a distributed network filesystem, performance will be only slightly less than that of filesystems that are local to NERSC compute platforms. This should only be an issue for applications whose performance is I/O bound.
NGF will grow in both capacity and bandwidth over the next several years, eventually replacing or dwarfing the amount of local storage on systems. NERSC is also working to seamlessly integrate NGF with the HPSS data archive to create much larger "virtual" data storage for projects. Once NGF is completely operational within the NERSC facility, Butler said, users at other centers such as NCAR and NASA Ames could be allowed to remotely access the NERSC filesystem, allowing users to read and visualize data without having to FTP the data. Eventually, the same capability could be extended to experimental research sites, such as accelerator labs.
“The availability of NGF has greatly facilitated my work, and I suspect it will be the way to go in the future,” predicts LBNL’s Martin White.
About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, NERSC serves almost 10,000 scientists at national laboratories and universities researching a wide range of problems in climate, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. Department of Energy. »Learn more about computing sciences at Berkeley Lab.