NERSCPowering Scientific Discovery for 50 Years

High Throughput Workflow Tools and Strategies, May 30, 2025

May 30, 2025

Introduction

NERSC is hosting an online webinar presented by William Arndt of NERSC and Geoffrey Lentner from Advanced Computing, Purdue University. The seminar is open to the general public.

Date and Time: 8:30 am - 12:00 pm (Pacific time), Friday, May 30, 2025

Abstract

Though High Performance Computing infrastructure is designed to maximize the performance of single large applications, it is just as capable of running high throughput workloads that scale by running larger numbers of small compute tasks. Valuable scientific problems are well solved by employing this pattern of small and independent processes such as those found in bioinformatics, High Energy Physics, Monte Carlo methods, data processing, and more.

This training session will discuss and demonstrate multiple software tools, in order of increasing complexity and power, well suited for managing high throughput workloads: GNU Parallel, Snakemake, and Hypershell. Additional attention will be given to designing effective arrangements of these tools, identifying performance bottlenecks, and best practices to improve the performance of throughput workloads. This training is for linux terminal users.

Registration

Registration Link

Presentation Materials

TBA