Superfacility
Mission Statement
The Superfacility concept is a framework for integrating experimental and observational instruments with computational and data facilities. Data produced by light sources, microscopes, telescopes, and other devices can stream in real-time to large computing facilities where it can be analyzed, archived, curated, combined with simulation data, and served to the science user community via powerful computing, storage, and networking systems. Connected with high-speed programmable networking, this superfacility model is more than the sum of its parts. It allows for discoveries across data sets, institutions, and domains and makes data from one-of-a-kind facilities and experiments broadly accessible.
The NERSC Superfacility project is designed to identify this concept’s technical and policy challenges for an HPC center. It coordinates and manages the work to address these challenges in partnership with target science teams. It is designed to ensure that the solutions developed are widely useful (rather than one-off engagements), scale to multiple user groups, and are scalable for NERSC staff to support.
Services in Development
Data Management and Sharing
We are working to develop and deploy tools that can be used to handle the large volumes of data generated by superfacility partners.
Data Transfer
- Globus is our tool of choice for large data transfers. We have several optimized data transfer nodes that can access every file system at NERSC.
- We are working to offer a new interface into HPSS that eliminates much of the difficulty of bundling and uploading files.
- A command line tool to do parallel transfers between file systems at NERSC (including HPSS) has been deployed on NERSC systems.
- Batch system integration of data movement is being explored
Data Discovery
- The NERSC Data Dashboard lets you see where your data is on the Project file system.
- A PI Dashboard is under development to allow PIs to address common issues (like permission drift) for the data they control
Data Sharing
- Spin, a service platform for deploying science gateways, has been successfully deployed.
- Globus Sharing has been enabled for data on the Project file system
The Superfacility API
We recognize that automation is an important driver for the experiment and observational facilities we work with. Automated experiment pipelines need to interact with NERSC without a human in the loop - moving data, launching compute jobs, and managing access. In response to this emerging and increasing need, NERSC has developed a rest-based API interface to many common functions and queries on our systems. For example, a user can query NERSC center status, submit or query the status of jobs, transfer data into and within NERSC, and get information about other users in their project.
The API is under active development - we are continually adding and refining functionality based on the needs of our partner facilities. For up-to-date information, please see the NERSC API documentation.
The Superfacility Demo Series (May 2020)
In May 2020, the Superfacility Project held a series of virtual demos of tools and utilities developed to support the needs of experimental scientists at ESnet and NERSC.
Date/Time | Topic/Speaker | Abstract | Recording |
May 6, 2020, noon PT |
SENSE: Intelligent Network Services for Science Workflows Xi Yang and the SENSE team |
The Software-defined network for End-to-end Networked Science at Exascale (SENSE) is a model-based orchestration system that operates between the SDN layer controlling the individual networks/end-sites and science workflow agents/middleware. The SENSE system includes Network Resource Manager and End-Site Resource Manager components, which enable advanced features in the areas of multi-resource integration, real-time responsiveness, and workflow middleware interactions. The demonstration will show the status of ongoing work to integrate SENSE services with domain science workflows, such as those envisioned for DOE Superfacility operations. A common vision for these integrations is the provisioning of SENSE Layer 2 and Layer 3 services based on knowledge of current and planned data transfers. SENSE allows workflow middleware to redirect traffic at granularities ranging from a single flow, specific end-system, or an entire end-site onto the desired SENSE provisioned services. The SENSE Layer 2 services provide deterministic end-to-end resource guarantees, including the network and Data Transfer Node (DTN) elements. The SENSE Layer 3 service provides the mechanisms for directing desired traffic onto specific Layer 3 VPN (L3VPN) for policy and/or quality of service reasons. |
|
May 13, 2020, noon PT |
Data Management Tools and Capabilities Lisa Gerhardt and Annette Grenier |
The PI Dashboard is a web portal that will allow PIs to address many of the common permission issues that come up when dealing with shared files on the Community File System. GHI is a new GPFS / HPSS interface that offers the benefits of a more familiar file system interface for HPSS. Often, users want to store complex directory structures or large bundles in HPSS, which can be difficult to do with the traditional HPSS access tool. GHI can easily move data between HPSS and the GPFS file system with a few simple commands. NERSC has written several command line data transfer scripts to users integrate data transfers into their workflows. We’ll do a brief demo of these scripts. |
|
May 20, 2020, noon PT |
Superfacility API: Automation for Complex Workflows at Scale
Gabor Torok, Cory Snavely, Bjoern Enders |
The Superfacility API aims to enable the use of all NERSC resources through purely automated means using popular development tools and techniques. An evolution of its predecessor, NEWT, the newly-designed API adds features designed to support complex, distributed workflows such as placing future job reservations and registration of API callbacks for asynchronous processes. It will also allow users to offload tedious tasks such as large data movement via simple REST calls. While the Superfacility API is designed for non-interactive use, this demonstration will use a Jupyter notebook to step through a working example that calls the API to conduct a simple workflow process. The discussion will include additional information on planned API endpoints and authentication methods. |
|
May 27, 2020, noon PT |
Docker Containers and Dark Matter: An Overview Of the Spin Container Platform with Highlights from the LZ Experiment Cory Snavely, Quentin Riffard, Tyler Anderson |
Spin is a container-based platform at NERSC designed for deploying science gateways, workflow managers, databases, API endpoints, and other network services to support scientific projects. Spin leverages the portability, modularity, and speed of Docker containers to allow NERSC users to deploy pre-built software images or design their own quickly. The underlying Rancher orchestration system provides a secure, managed infrastructure with access to NERSC systems, storage, and networks. One project making use of Spin as part of its engagement with the Superfacility project is the LZ Dark Matter Experiment, which is preparing to operate a 10-ton, liquid-xenon-based detector a mile underground at the Sanford Underground Research Facility (SURF) in South Dakota. The collaboration of some 250 scientists and 37 research institutions is busily readying the detector and associated software and data systems. Services that will run in Spin to support the LZExperiment range from databases to data transfer monitoring and have been exercised during mock data challenges. In this demonstration, NERSC staff will give an overview of the Spin platform and show how a simple service is created in a few seconds. LZ staff will then describe the science of dark matter detection and give an overview of their work in Spin so far, focusing on the Event Viewer, a science gateway that allows researchers to examine significant detector events. |
|
June 3, 2020, noon PT |
Jupyter
Matthew Henderson (w. Shreyas Cholia and Rollin Thomas) |
Large scale “Superfacility” type experimental science workflows require support for a unified, interactive, real-time platform that can manage a distributed set of resources connected to High Performance Computing (HPC) systems. Here, we demonstrate how the Jupyter platform plays a key role in this space - it provides the ease of use and interactivity of a web science gateway while providing scientists the ability to build custom, ad-hoc workflows in a composable way. Using real-world use cases from the National Center for Electron Microscopy (NCEM), we show how Jupyter facilitates interactive data analysis at scale on NERSC HPC resources.
Jupyter Notebooks combine live executable code cells with inline documentation and embedded interactive visualizations. This allows us to capture an experiment in a fully contained executable Notebook that is self-documenting and incorporates live rendering of outputs and results as they are generated. The Notebook format lends itself to a highly modular and composable workflow, where individual steps and parameters can be adjusted on the fly. The Jupyter platform can also support custom applications and extensions that live alongside the core Notebook interface. We will use real world science examples to show how we create an improved interactive HPC experience in Jupyter, including:
- Improvements to the NERSC JupyterHub Deployment
- Scaling up code in a Jupyter notebook to run on HPC resources through the use of parallel task execution frameworks
- Demonstrating the use of the Dask task framework as a backend to manage workers from Jupyter
- Enabling project-wide workflows and collaboration through sharing and cloning notebooks and their associated software environments
We will also discuss related projects and potential future directions.
|
Papers and Posters Related to the Superfacility Model
- “Superfacility: The Convergence of Data, Compute, Networking, Analytics, and Software,” Book chapter in Handbook on Big Data and Machine Learning in the Physical Sciences, pp. 361-386 (2020)
- “Interactive Supercomputing with Jupyter,” Rollin Thomas and Shreyas Cholia (2021).
- “Cross-facility science with the Superfacility Project at LBNL,” Bjoern Enders and the LBNL Superfacility team, XLOOP workshop at SC20 (2020).
- “Interactive Parallel Workflows for Synchrotron Tomography,” Dilworth Parkinson, Harinarayan Krishnan, Daniela Ushizima, Matthew Henderson, Shreyas Cholia, XLOOP workshop at SC20 (2020).
- “Software-Defined Network for End-to-end Networked Science at the Exascale,” I. Monga, C. Guok, J. MacAuley, A. Sim, H. Newman, J. Balcas, P. DeMar, L. Winkler, T. Lehman, X. Yang (2020)
- “Exploring Metadata Search Essentials for Scientific Data Management,” Wei Zhang, Suren Byna, Chenxu Niu, and Yong Chen, 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC) (2019).
- “MIQS: metadata indexing and querying service for self-describing file formats,” Wei Zhang, Suren Byna, Houjun Tang, Brody Williams, and Yong Chen, SC19 (2019).
- “I/O Performance Analysis of Science Applications Using HDF5 File-level Provenance”, Tonglin Li, Quincey Koziol, Houjun Tang, Jialin Liu, and Suren Byna CUG19 (2019).
- “SDN for End-to-End Networked Science at the Exascale (SENSE),” I. Monga, C. Guok, J. MacAuley, A. Sim, H. Newman, J. Balcas, P. DeMar, L. Winkler, T. Lehman, X. Yang, Innovate the Network for Data-Intensive Science Workshop (INDIS 2018) at SC218 (2018).
- “DART: Distributed Adaptive Radix Tree for Efficient Affix-based Keyword Search on HPC Systems,” Wei Zhang, Houjun Tang, Suren Byna, and Yong Chen, The 27th International Conference on Parallel Architectures and Compilation Techniques (PACT’18) (2018).
- “Enabling a SuperFacility with Software Defined Networking,” Richard Shane Canon, Tina Declerck, Brent Draney, Jason Lee, David Paul, David Skinner, CUG 2017
Articles about the Superfacility Model
- “Super-connected HPC,” DEIXIS magazine, (2021)
- “A COSMIC Approach to Nanoscale Science” (2021)
- “Superfacility Model Brings COVID Research Into Real Time” (2021)
- “Superfacility Framework Advances Photosynthesis Research” (2019)
- “The Superfacility Concept” feature episode of NERSC podcast (2019)
- “Superfacility – How new workflows in the DOE Office of Science are changing storage system requirements” (2016)
- “ESnet Paves Way for HPC “Superfacility” Real-Time Beamline Experiments” (2015)
Talks about the Superfacility Model
- “Cross-Facility Science: The Superfacility Model at Lawrence Berkeley National Laboratory,” State of the Practice talk, SC20 (Debbie Bard, 2020).
- “Visual Data Management at NERSC, ”Lisa Gerhardt and Annette Greiner, State of the Practice talk, SC20 (2020).
- “A User-Centered Data Management System for HPC,” Lisa Gerhardt and Annette Greiner, DOE Data Day (2020).
- “Cross-facility science with the Superfacility Project at LBNL,” XLOOP workshop, SC20 (2020, Bjoern Enders).
- “The NERSC Superfacility Project: A Technical Overview,” GPUs for Science (2019, Cory Snavely).
- “Supercomputing and the Scientist: How HPC and analytics are transforming Experimental Science,” Keynote at DataTech, Edinburgh, UK (Debbie Bard).
- “A Superfacility Model for Data-Intensive Science,” Juliacon (2017, Kathy Yelick).