Ershaad Basheer
Conference Papers
Lisa Gerhardt, Stephen Simms, David Fox, Kirill Lozinskiy, Wahid Bhimji, Ershaad Basheer, Michael Moore, "Nine Months in the Life of an All-flash File System", Proceedings of the 2024 Cray User Group, May 8, 2024,
NERSC’s Perlmutter scratch file system, an all-flash Lustre storage system running on HPE (Cray) ClusterStor E1000 Storage Systems, has a capacity of 36 PetaBytes and a theoretical peak performance exceeding 7 TeraBytes per second across HPE’s Slingshot network fabric. Deploying an all-flash Lustre file system was a leap forward in an attempt to meet the diverse I/O needs of NERSC. With over 10,000 users representing over 1,000 different projects that span multiple disciplines, a file system that could overcome the performance limitations of spinning disk and reduce performance variation was very desirable. While solid state provided excellent performance gains, there were still challenges that required observation and tuning. Working with HPE’s storage team, NERSC staff engaged in an iterative process that increased performance and provided more predictable outcomes. Through the use of IOR and OBDfilter tests, NERSC staff were able to closely monitor the performance of the file system at regular intervals to inform the process and chart progress. This paper will document the results of and report insights derived from over 9 months of NERSC’s continuous performance testing, and provide a comprehensive discussion of the tuning and adjustments that were made to improve performance.