
Accelerated Computing
Registration Form: https://forms.gle/cxunwyhdDea3DEzLA
Fundamentals of Accelerated Computing with CUDA C/C++
Duration: 8 Hours
The CUDA computing platform enables the acceleration of CPU-only applications to run on the world’s fastest massively parallel GPUs. Experience C/C++ application acceleration by:
- Accelerating CPU-only applications to run their latent parallelism on GPUs
- Utilizing essential CUDA memory management techniques to optimize accelerated applications
- Exposing accelerated application potential for concurrency and exploiting it with CUDA streams
- Leveraging command line and visual profiling to guide and check your work
Upon completion, you’ll be able to accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques. You’ll understand an iterative style of CUDA development that will allow you to ship accelerated applications fast.
Click here to go to the NVIDIA website for full details.
Fundamentals of Accelerated Computing with CUDA Python
Duration: 8 Hours
This course explores how to use Numba—the just-in-time, type-specializing Python function compiler—to accelerate Python programs to run on massively parallel NVIDIA GPUs. You’ll learn how to:
- Use Numba to compile CUDA kernels from NumPy universal functions (ufuncs).
- Use Numba to create and launch custom CUDA kernels.
- Apply key GPU memory management techniques.
Click here to go to the NVIDIA website for full details.
Scaling Workloads Across Multiple GPUs with CUDA C++
Duration: 4 Hours
Writing CUDA C++ applications that efficiently and correctly utilize all available GPUs on a node drastically improves performance over single-GPU code, and makes the most cost-effective use out of compute nodes with multiple GPUs. In this workshop you will learn to utilize multiple GPUs on a single node by:
- Learning how to launch kernels on multiple GPUs, each working on a subsection of the required work
- Learning how to use concurrent CUDA Streams to overlap memory copy with computation on multiple GPUs
Click here to go to the NVIDIA website for full details.
Accelerating CUDA C++ Applications with Concurrent Streams
Duration: 4 Hours
The concurrent overlap of GPU computation and the transfer of memory to and from the GPU can drastically improve the performance of CUDA applications. In this workshop you will learn to utilize CUDA Streams to perform copy/compute overlap in CUDA C++ applications by:
- Learning the rules and syntax governing the use of concurrent CUDA Streams
- Refactoring and optimizing an existing CUDA C++ application to use CUDA Streams and perform copy/compute overlap
- Rely on the NVIDIA® Nsight™ Systems Visual Profiler timeline to observe improvement opportunities and the impact of the techniques covered in the workshop.