Fundamental CUDA Optimization (Part 2) -- Part 4 of 9 CUDA Training Series, Apr 16, 2020
Introduction
CUDA® is a parallel computing platform and programming model that extends C++ to allow developers to program GPUs with a familiar programming language and simple APIs.
NVIDIA will present a 9-part CUDA training series intended to help new and existing GPU programmers understand the main concepts of the CUDA platform and its programming model. Each part will include a 1-hour presentation and example exercises. The exercises are meant to reinforce the material from the presentation and can be completed during a 1-hour hands-on session following each lecture (for in-person participants) or on your own (for remote participants). OLCF and NERSC will both be holding events for each part of the series. Please see the topics of each part and dates here.
Part 4 of 9: Fundamental CUDA Optimization (Part 2)
Date and Time: 10 am - 12 pm (Pacific time), Thursday, April 16, 2020
The format of this event will be online only. NVIDIA will present via WebEx for the first ~1 hour and the WebEx session will be left open for the hands-on session, where representatives from OLCF, NERSC, and NVIDIA will be available to support remote participants.
This part of the series is aimed at basic optimization principles. We will introduce users to optimization strategies related to kernel launch configurations, GPU latency hiding, global memory throughput, and shared memory applicability.
After the presentation, there will be a hands-on session where participants can complete example exercises meant to reinforce the presented concepts.
Registration
Please register here.
Presentation Materials
- Slides
- Recording
- Exercises: The example exercises for this module can be found in the "exercises/hw4" folder of this GitHub repo. (to be available shortly before the event)
- Survey: Please help us to provide feedback by completing a 3-min short survey (access via the "Survey" tab located in the middle of this page)