NERSCPowering Scientific Discovery for 50 Years

Deep Learning at Scale Training, March 3-4, 2025

March 3, 2025

NERSC and NVIDIA are hosting a hybrid, hands-on Deep Learning at Scale training event on March 3-4 in Berkeley, CA . This training will help users explore distributed training for deep learning models on high-performance computing systems (specifically Perlmutter). The training will focus on building a large-scale deep learning model on a real scientific application (transformers for weather forecasting) and walk users through profiling tools and performance optimization on a single GPU, scaling to multiple GPUs (and nodes) through distributed training with data parallelism (along with tips and techniques to scale) as well as advanced parallelization for very large models with model parallelism. 

We will provide example code and datasets to allow attendees to experiment hands-on with optimized and scalable distributed training of our scientific deep learning model on Perlmutter. Due to the hands-on experiments on Perlmutter, the event attendance will be capped. However, all training material as well as the lecture recordings will be made available after the event. OLCF and ALCF users are welcome to attend. Training accounts will be provided if needed.

Logistics

This event will be hybrid. Onsite location (in B59, see visitor information for more details) and zoom link details will be shared soon. 

Agenda

All times are in Pacific time zone. Agenda below is tentative.

  Day 1: March 3  
Time Topic  Presenter
09:00 - 10:00 Introduction + Perlmutter Setup Shashank Subramanian (NERSC) and Steven Farrell (NERSC)
10:00 - 10:15 Break  
10:15 - 11:00  Deep Learning Performance on a GPU Josh Romero (NVIDIA)
11:00 - 12:00 Hands-on: Profiling and Optimizing GPU Training Josh Romero (NVIDIA) and NERSC
12:00 - 13:00 Discussions NERSC
  Day 2: March 4  
09:00 - 09:30 Scaling with Data Parallelism Steven Farrell (NERSC)
09:30 - 10:30 Hands-on: Data Parallelism NERSC
10:30 - 10:45 Break  
10:45 - 11:15 Scaling with Model Parallelism Shashank Subramanian (NERSC)
11:15 - 12:15 Hands-on: Model Parallelism NERSC
12:15 - 13:00 Discussions NERSC

Registration

Registration will be on a first-come first-served basis and will be capped due to the hands-on nature of the event. Please fill out the registration form for attending.