Accelerated Computing

Registration Form: https://forms.gle/cxunwyhdDea3DEzLA

Fundamentals of Accelerated Computing with CUDA C/C++

Duration: 8 Hours

The CUDA computing platform enables the acceleration of CPU-only applications to run on the world’s fastest massively parallel GPUs. Experience C/C++ application acceleration by:

Accelerating CPU-only applications to run their latent parallelism on GPUs
Utilizing essential CUDA memory management techniques to optimize accelerated applications
Exposing accelerated application potential for concurrency and exploiting it with CUDA streams
Leveraging command line and visual profiling to guide and check your work

Upon completion, you’ll be able to accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques. You’ll understand an iterative style of CUDA development that will allow you to ship accelerated applications fast.

Click here to go to the NVIDIA website for full details.

Fundamentals of Accelerated Computing with CUDA Python

Duration: 8 Hours

This course explores how to use Numba—the just-in-time, type-specializing Python function compiler—to accelerate Python programs to run on massively parallel NVIDIA GPUs. You’ll learn how to:

Use Numba to compile CUDA kernels from NumPy universal functions (ufuncs).
Use Numba to create and launch custom CUDA kernels.
Apply key GPU memory management techniques.

Click here to go to the NVIDIA website for full details.

Scaling Workloads Across Multiple GPUs with CUDA C++

Duration: 4 Hours

Writing CUDA C++ applications that efficiently and correctly utilize all available GPUs on a node drastically improves performance over single-GPU code, and makes the most cost-effective use out of compute nodes with multiple GPUs. In this workshop you will learn to utilize multiple GPUs on a single node by:

Learning how to launch kernels on multiple GPUs, each working on a subsection of the required work
Learning how to use concurrent CUDA Streams to overlap memory copy with computation on multiple GPUs

Click here to go to the NVIDIA website for full details.

Accelerating CUDA C++ Applications with Concurrent Streams

Duration: 4 Hours

The concurrent overlap of GPU computation and the transfer of memory to and from the GPU can drastically improve the performance of CUDA applications. In this workshop you will learn to utilize CUDA Streams to perform copy/compute overlap in CUDA C++ applications by:

Learning the rules and syntax governing the use of concurrent CUDA Streams
Refactoring and optimizing an existing CUDA C++ application to use CUDA Streams and perform copy/compute overlap
Rely on the NVIDIA® Nsight™ Systems Visual Profiler timeline to observe improvement opportunities and the impact of the techniques covered in the workshop.

Click here to go to the NVIDIA website for full details.

Menu

Contact

Accelerated Computing

Registration Form: https://forms.gle/cxunwyhdDea3DEzLA

Fundamentals of Accelerated Computing with CUDA C/C++

Duration: 8 Hours

Fundamentals of Accelerated Computing with CUDA Python

Duration: 8 Hours

Scaling Workloads Across Multiple GPUs with CUDA C++

Duration: 4 Hours

Accelerating CUDA C++ Applications with Concurrent Streams

Duration: 4 Hours