NERSCPowering Scientific Discovery for 50 Years

Parallel I/O Software Infrastructure for Large-Scale Systems

Choudhary.png

An illustration of how MPI-­‐IO file domain alignment works to minimize file locking and the number of clients accessing each file server on a machine like NERSC's Franklin system.

Key Challenges: The goal is enabling application-driven, large-scale I/O that is fast, portable, scalable across a range of HPC systems, and easy to use.  This inovolves creating optimizations for the MPI-IO library (ROMIO) and Parallel netCDF libraries based on performance evaluations using real applications such as S3D, FLASH, and GCRM.

Why it Matters: The sheer volume and increasing complexity of data being generated in simulations or collected by experimentalists is already interfering with the scientific investigation process.  Delays in either checkpointing simulations or writing output files cause unacceptable delays in processing and cause machine efficiencies to decrease. 

Accomplishments: MPI-IO file domain alignments that reorganize I/O requests to match the Lustre locking protocol have shown significant performance improvement on Franklin.  I/O delegation methods, which use an additional set of compute nodes to enable caching, prefetching, and aggregation, also shows promise.  I/O delegation is an I/O software layer in which I/O delegate processes, not application clients, collaborate to adjust and reorganize I/O requests that  avoid file system control overheads, such as file locking.  Results have also been obtained in a version of the parallel netCDF library that uses non-blocking I/O functions.  This new design is able to aggregate small I/O requests into large ones, producing significantly performance improvement.  This benefits the climate simulation community, which has simulation programs that often perform batched I/O requests on a group of array variables and the individual variable sizes are not large enough to obtain a good performance.

Investigators: Alok Choudhary (Northwestern University)

More Information: See, for example, Proceedings of the 2009 International Conference on Parallel Processing, page 470, and Professor Choudhary's web site. 

 


About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, NERSC serves almost 10,000 scientists at national laboratories and universities researching a wide range of problems in climate, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. Department of Energy. »Learn more about computing sciences at Berkeley Lab.