OLCF AI Training Series: AI for Science at Scale – Part 2, Oct 12, 2023
Held October 12, 2023, this session is the second part of the OLCF’s AI for Science at Scale training series and is open to NERSC users. Part one covered how to train a deep learning model in a distributed fashion across multiple GPUs of the Summit supercomputer using data parallelism. Building on this, this session will focus on how to train a model on multiple GPUs across nodes of the Frontier supercomputer and will demonstrate and focus on model parallelism techniques and frameworks, such as DeepSpeed, FSDP, and Megatron.
How to Apply
Please visit the training event page for registration information.
Time (EDT) | Topic | Speaker |
1 - 1:45 p.m. | Scaling, LLMs | Sajal Dash (OLCF, Analytics & AI Methods at Scale) |
1:45 - 2 p.m. | Scientific Applications | Sajal Dash |
2 - 3 p.m. | Hands-on Examples | Sajal Dash |
Training Materials
Slides and recordings are available at https://github.com/olcf/ai-training-series.