TOKIO: Total Knowledge of I/O

The Total Knowledge of I/O (TOKIO) project is developing algorithms and a software framework to analyze I/O performance and workload data from production HPC resources at multiple system levels. This holistic I/O characterization framework provides a clearer view of system behavior and the causes of deleterious behavior to application scientists, facility operators and computer science researchers in the field. TOKIO is a collaboration between the Lawrence Berkeley and Argonne National Laboratories and is funded by the DOE Office of Science through the Office of Advanced Scientific Computing Research, and its reference implementation is open for contributions and download on GitHub.

TOKIO Architecture

TOKIO is a software framework that is designed to encapsulate the mechanics of various I/O monitoring and characterization tools used in HPC centers around the world and reduce the burden on institutional I/O experts to maintain a complete understanding of how each tool works and what data it provides. It is comprised of three distinct layers:

TOKIO connectors are modules that interface directly with component-level monitoring tools such as Darshan, LMT, or mmperfmon. They simply convert the data emitted by a specific tool into an in-memory object that can be manipulated by the other layers of the TOKIO framework.
TOKIO tools combine site-specific knowledge and different connectors and expose interfaces to provide data from parts of the storage subsystem in a way that does not require a deep understanding of the specific tools used by an HPC center. For example, a tool may provide an answer to the question of “What was the I/O performance of job 5723433?” by understanding how to use that jobid to find any and all monitoring data that represent I/O performance.
TOKIO analysis apps and data services are more sophisticated analyses, visualization tools, and data management utilities that combine tools and connectors to provide a holistic view of all components in the I/O subsystem.

Overview of the TOKIO architecture

An example of a TOKIO analysis app is the Unified Monitoring and Metrics Interface (UMAMI) which provides a simple visualization of how different components of the I/O subsystem were performing over a time of interest.

Unified Monitoring and Metrics Interface (UMAMI) of an anomalously performing HACC job on Edison’s scratch3 file system during 2017.

The complete TOKIO architecture is described in a paper presented at the 2018 Cray User Group meeting.

Getting Started

pytokio is the Python implementation of the TOKIO framework, and you can find pytokio’s source on GitHub and the latest version of pytokio on the Python Package Index, pypi.

pytokio at NERSC

To give pytokio a try, you can use it to access I/O data at NERSC. Install it in a conda environment:

$ module load python/3.6-anaconda-5.2
$ conda create -n pytokio python=3.6 ipykernel pip
$ source activate pytokio
$ pip install --pre pytokio[collectdes,nersc_globuslogs,esnet_snmp,lmtdb,nersc_jobsdb]

If you’d like to use pytokio from within the Jupyter service at NERSC, there’s just one extra step to create a kernel that is pytokio-enabled:

$ python -m ipykernel install --user --name pytokio --display-name pytokio

See NERSC’s Jupyter documentation for more information on how to create your own Jupyter kernels.

pytokio on your laptop

To give pytokio a try on your own computer, you can download all of the code and data necessary to reproduce a paper presented at SC’18, A Year in the Life of a Parallel File System, and work through the provided Jupyter notebooks.

The pytokio repository contains an examples directory that contains notebooks that should work from NERSC’s Jupyter service. Additional documentation on pytokio can be found on the official pytokio documentation site.

Participants

Nicholas J. Wright (LBNL) - Lead Principal Investigator
Philip Carns (ANL) - Institutional Principal Investigator
Suren Byna (LBNL) - Co-investigator
Rob Ross (ANL) - External collaborator
Prabhat - External collaborator
Glenn K. Lockwood
Shane Snyder (ANL)

Publications

Glenn K. Lockwood, Shane Snyder, Suren Byna, Philip Carns, Nicholas J. Wright. “Understanding Data Motion in the Modern HPC Data Center.” In Proceedings of the 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW). Denver, CO. November 2019. (Slides)
Teng Wang, Suren Byna, Glenn K. Lockwood, Shane Snyder, Philip Carns, Sunggon Kim, Nicholas J. Wright. “A Zoom-in Analysis of I/O Logs to Detect Root Causes of I/O Performance Bottlenecks.” In Proceedings of the 19th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGRID). Larnaca, Cyprus. May 2019.
Glenn K. Lockwood, Shane Snyder, Teng Wang, Suren Byna, Philip Carns, Nicholas J. Wright. “A Year in the Life of a Parallel File System.” In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’18). Dallas, TX. November 2018. (Slides)
Jakob Luttgau, Shane Snyder, Philip Carns, Justin Wozniak, Julian Kunkel, and Thomas Ludwig. “Toward Understanding I/O Behavior in HPC Workflows.” In Proceedings of the 3rd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS’18). Dallas, TX. November 2018. (Slides)
Teng Wang, Shane Snyder, Glenn K. Lockwood, Philip Carns, Nicholas J. Wright, and Suren Byna. “IOMiner: Large-Scale Analytics Framework for Gaining Knowledge from I/O Logs.” In Proceedings of the 2018 IEEE International Conference on Cluster Computing (CLUSTER). Belfast, UK. September 2018.
Glenn K. Lockwood, Shane Snyder, George Brown, Kevin Harms, Philip Carns, Nicholas J. Wright. “TOKIO on ClusterStor: Connecting Standard Tools to Enable Holistic I/O Performance Analysis.” In Proceedings of the 2018 Cray User Group. Stockholm, SE. May 2018. (Slides)
Glenn K. Lockwood, Wucherl Yoo, Suren Byna, Nicholas J. Wright, Shane Snyder, Kevin Harms, Zachary Nault, Philip Carns. “UMAMI: a recipe for generating meaningful metrics through holistic I/O performance analysis.” In Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS’17). Denver, CO. November 2017. (Slides)
Shane Snyder, Philip Carns, Kevin Harms, Robert Ross, Glenn K. Lockwood, Nicholas J. Wright. “Modular HPC I/O Characterization with Darshan.” In Proceedings of 5th Workshop on Extreme-scale Programming Tools (ESPT 2016). Salt Lake City, UT. November 2016.
Cong Xu, Suren Byna, Vishwanath Venkatesan, Robert Sisneros, Omkar Kulkarni, Mohamad Chaarawi, and Kalyana Chadalavada, “LIOProf: Exposing Lustre File System Behavior for I/O Middleware.” In Proceedings of the 2016 Cray User Group. London, UK. May 2016.

Presentations

Philip Carns. “Understanding and tuning HPC I/O: How hard can it be?” 4th annual HPC I/O in the Data Center Workshop (HPC-IODC) and Workshop on Performance and Scalability of Storage Systems (WOPSSS), ISC 2018, Frankfurt, DE. June 2018.
Philip Carns, Julian Kunkel, Glenn K. Lockwood, Ross Miller, Eugen Betke, Wolfgang Frings. “Analyzing Parallel I/O.” Birds of a Feather session, International Conference for High Performance Computing, Networking, Storage and Analysis (SC17), Denver, USA. November 2017.
Philip Carns. “Characterizing data-intensive scientific applications with Darshan.” CS/NERSC Data Seminar, National Energy Research Scientific Computing Center. June 2017.
Philip Carns. “Characterizing HPC I/O: from Applications to Systems.” ZIH Colloquium at Technische Universität Dresden, Dresden, DE. April 2017.
Philip Carns. “TOKIO: Using Lightweight Holistic Characterization to Understand, Model, and Improve HPC I/O Performance.” SIAM Conference on Computational Science and Engineering, Atlanta, GA. March 2017.
Shane Snyder. “Leveraging Holistic Characterization for Insights into HPC I/O Behavior.” 2017 Understanding I/O Performance Behavior (UIOP) Workshop, DKRZ, Hamburg. March 2017.
Glenn K. Lockwood, Nicholas J. Wright. “Understanding I/O performance on burst buffers through holistic I/O characterization.” MCS Seminar, Argonne National Laboratory. May 2016.
Glenn K. Lockwood. “Developing a holistic understanding of I/O workloads on future architectures.” 2016 SIAM Conference on Parallel Processing for Scientific Computing, Paris. April 2016.
Julian Kunkel, Philip Carns, Shane Snyder, Huong Luu, Matthieu Dorier, Wolfgang Frings, and Glenn K. Lockwood. “Analyzing Parallel I/O.” Birds of a Feather session, International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin. November 2015.

Related Work

Sandeep Madireddy, Prasanna Balaprakash, Philip Carns, Robert Latham, Robert Ross, Shane Snyder, and Stefan M. Wild. “Machine Learning Based Parallel I/O Predictive Modeling: A Case Study on Lustre File Systems.” In Proceedings of the 33rd International Conference, ISC High Performance 2018. Frankfurt, DE. 2018.
Wahid Bhimji, Debbie Bard, Melissa Romanus, et al. “Accelerating science with the NERSC burst buffer early user program.” 2016 Cray User Group, London. May 2016.

Download pytokio

pytokio Documentation

A Year in the Life of a Parallel File System

The TOKIO team presented a paper titled "A Year in the Life of a Parallel File System" at the 2018 International Conference for High Performance Computing, Networking, and Storage (SC'18) that demonstrate new techniques for classifying the sources of performance variation over time. A year-long dataset documenting I/O performance variation on file systems at NERSC and the Argonne Leadership Computing Facility (ALCF) was then analyzed with these techniques to demonstrate their efficacy and… Read More »

Designing an All-Flash File System

NERSC's Perlmutter system will feature a 30 PB all-NVMe Lustre file system capable of over 4 TB/sec write bandwidth. Because this file system will be the first all-NVMe file system deployed at this scale, extensive quantitative analysis was undertaken by NERSC to answer the following questions: Will 30 PB of capacity be enough for a system of Perlmutter’s capability? What is the best SSD endurance rating to balance the cost and longevity of the system and its five-year service life? How… Read More »