NERSC End-to-End LLM Bootcamp, May 2024
The NERSC End-to-End LLM Bootcamp will be hosted virtually for three days from May 7-9, 2024 and is co-organized by NERSC with support from OpenACC organization and NVIDIA.
The End-to-End LLM (Large Language Model) Bootcamp is designed from a real-world perspective that follows the data processing, development, and deployment pipeline paradigm. Attendees walk through the workflow of preprocessing the SQuAD (Stanford Question Answering Dataset) dataset for Question Answering task, training the dataset using BERT (Bidirectional Encoder Representations from Transformers), and executing prompt learning strategy using NVIDIA® NeMo™ and a transformer-based language model, NVIDIA Megatron. Attendees will also learn to optimize an LLM using NVIDIA TensorRT™, an SDK for high-performance deep learning inference, guardrail prompts and responses from the LLM model using NeMo Guardrails, and deploy the AI pipeline using NVIDIA Triton™ Inference Server, an open-source software that standardizes AI model deployment and execution across every workload.
This virtual bootcamp is a hands-on learning experience where you will be guided through step-by-step instructions with teaching assistants on hand to help throughout.
Eligibility and Prerequisites
This event is open to the general HPC community. However, due to limited capacity, active NERSC, ALCF, and OLCF users will receive priority consideration. Prerequisites:
- Basic experience with Python and PyTorch.
- No GPU programming experience is required.
Agenda
All times are in Pacific time zone.
09:00 AM - 10:00 AM | Cluster Connection Session |
09:00 AM - 09:10 AM | Welcome and Introduction |
09:10 AM - 09:20 AM | Cluster Overview |
09:20 AM - 10:00 AM | Overview of Large Language Models and NeMo Megatron (Lecture) |
10:00 AM - 10:30 AM | Nemo Fundamentals (Lab) |
10:30 AM - 10:45 AM | Break |
10:45 AM - 12:15 PM | NeMo Question Answering (Lab) |
12:15 PM - 12:45 PM | Wrap up Q&A |
12:45 PM - 01:30 PM | LLM Projects Discussion |
09:00 AM - 09:45 AM | Prompt Tuning/P-Tuning (Lecture) |
09:45 AM - 10:30 AM | Prompt Tuning/P-Tuning (Lab) |
10:30 AM - 10:45 AM | Break |
10:45 AM - 11:00 AM | NeMo Megatron-GPT 1.3B: Language Model Inferencing (Lab) |
11:00 AM - 11:35 AM | LLama2 7B Inference using TensorRT-LLM (Lecture and Lab) |
11:35 AM - 12:05 PM | LLama2 7B deployment using Triton Inference server (Lab) |
12:05 PM - 12:30 PM | Wrap up Q&A |
12:30 PM - 01:30 PM | LLM Projects Discussion |
09:00 AM - 10:00 AM | NeMo Guardrails Overview and Architecture (Lecture) |
10:00 AM - 10:20 AM | NeMo Guardrails Topical Rails (Lab) |
10:20 AM - 10:40 AM | NeMo Guardrails Jailbreak Rails (Lab) |
10:40 AM - 11:00 AM | NeMo Guardrails Grounding Rails (Lab) |
11:00 AM - 11:15 AM | Break |
11:15 AM - 11:35 AM | NeMo Guardrails Moderation Rails (Lab) |
11:35 AM - 12:20 PM | NeMo Guardrails Langchain and Prompt Templates (Lab) |
12:20 PM - 12:50 PM | Nemo LLM Service Demo |
12:50 PM - 01:30 PM | Wrap up Q&A |
Application
The application deadline is April 24, 2024. For application and more details, please visit the event page. Please note acceptance is not confirmed until you have received a confirmation email.
Presentation Materials
Please find the presentation materials (password-protected) for this event here.