NERSCPowering Scientific Discovery for 50 Years

TOKIO: Total Knowledge of I/O

The Total Knowledge of I/O (TOKIO) project is developing algorithms and a software framework to analyze I/O performance and workload data from production HPC resources at multiple system levels. This holistic I/O characterization framework provides a clearer view of system behavior and the causes of deleterious behavior to application scientists, facility operators and computer science researchers in the field. TOKIO is a collaboration between the Lawrence Berkeley and Argonne National Laboratories and is funded by the DOE Office of Science through the Office of Advanced Scientific Computing Research, and its reference implementation is open for contributions and download on GitHub.

TOKIO Architecture

TOKIO is a software framework that is designed to encapsulate the mechanics of various I/O monitoring and characterization tools used in HPC centers around the world and reduce the burden on institutional I/O experts to maintain a complete understanding of how each tool works and what data it provides. It is comprised of three distinct layers:

  • TOKIO connectors are modules that interface directly with component-level monitoring tools such as Darshan, LMT, or mmperfmon. They simply convert the data emitted by a specific tool into an in-memory object that can be manipulated by the other layers of the TOKIO framework.
  • TOKIO tools combine site-specific knowledge and different connectors and expose interfaces to provide data from parts of the storage subsystem in a way that does not require a deep understanding of the specific tools used by an HPC center. For example, a tool may provide an answer to the question of “What was the I/O performance of job 5723433?” by understanding how to use that jobid to find any and all monitoring data that represent I/O performance.
  • TOKIO analysis apps and data services are more sophisticated analyses, visualization tools, and data management utilities that combine tools and connectors to provide a holistic view of all components in the I/O subsystem.
TOKIO architecture

Overview of the TOKIO architecture

 

An example of a TOKIO analysis app is the Unified Monitoring and Metrics Interface (UMAMI) which provides a simple visualization of how different components of the I/O subsystem were performing over a time of interest.

UMAMI of HACC on Edison scratch3

Unified Monitoring and Metrics Interface (UMAMI) of an anomalously performing HACC job on Edison’s scratch3 file system during 2017.

 

The complete TOKIO architecture is described in a paper presented at the 2018 Cray User Group meeting.

Getting Started

pytokio is the Python implementation of the TOKIO framework, and you can find pytokio’s source on GitHub and the latest version of pytokio on the Python Package Index, pypi.

pytokio at NERSC

To give pytokio a try, you can use it to access I/O data at NERSC. Install it in a conda environment:

$ module load python/3.6-anaconda-5.2
$ conda create -n pytokio python=3.6 ipykernel pip
$ source activate pytokio
$ pip install --pre pytokio[collectdes,nersc_globuslogs,esnet_snmp,lmtdb,nersc_jobsdb]

If you’d like to use pytokio from within the Jupyter service at NERSC, there’s just one extra step to create a kernel that is pytokio-enabled:

$ python -m ipykernel install --user --name pytokio --display-name pytokio

See NERSC’s Jupyter documentation for more information on how to create your own Jupyter kernels.

pytokio on your laptop

To give pytokio a try on your own computer, you can download all of the code and data necessary to reproduce a paper presented at SC’18, A Year in the Life of a Parallel File System, and work through the provided Jupyter notebooks.

The pytokio repository contains an examples directory that contains notebooks that should work from NERSC’s Jupyter service. Additional documentation on pytokio can be found on the official pytokio documentation site.

Participants

Publications

Presentations

Related Work

A Year in the Life of a Parallel File System

The TOKIO team presented a paper titled "A Year in the Life of a Parallel File System" at the 2018 International Conference for High Performance Computing, Networking, and Storage (SC'18) that demonstrate new techniques for classifying the sources of performance variation over time.  A year-long dataset documenting I/O performance variation on file systems at NERSC and the Argonne Leadership Computing Facility (ALCF) was then analyzed with these techniques to demonstrate their efficacy and… Read More »

Designing an All-Flash File System

NERSC's Perlmutter system will feature a 30 PB all-NVMe Lustre file system capable of over 4 TB/sec write bandwidth. Because this file system will be the first all-NVMe file system deployed at this scale, extensive quantitative analysis was undertaken by NERSC to answer the following questions: Will 30 PB of capacity be enough for a system of Perlmutter’s capability? What is the best SSD endurance rating to balance the cost and longevity of the system and its five-year service life? How… Read More »