Scaling CUDA C++ Applicaions to Multiple Nodes
(CUDA C++ 애플리케이션을 멀티 노드로 확장)
-
Duration
1day 8hous
-
Language
English
-
Tools, libraries, and frameworks
CUDA, MPI, NVSHMEM
교육목적
Present-day high-performance computing (HPC) and deep learning applications benefit from, and even require, cluster-scale GPU compute power. Writing CUDA® applications that can correctly and efficiently utilize GPUs across a cluster requires a distinct set of skills. In this workshop, you’ll learn the tools and techniques needed to write CUDA C++ applications that can scale efficiently to clusters of NVIDIA GPUs.
NVSHMEM is a parallel programming interface based on OpenSHMEM that provides efficient and scalable communication for NVIDIA GPU clusters. NVSHMEM creates a global address space for data that spans the memory of multiple GPUs and can be accessed with fine-grained GPU-initiated operations, CPU-initiated operations, and operations on CUDA streams. NVSHMEM's asynchronous, GPU-initiated data transfers eliminate synchronization overheads between the CPU and the GPU. They also enable long-running kernels that include both communication and computation, reducing overheads that can limit an application’s performance when strong scaling. NVSHMEM has been used on systems such as the Summit supercomputer located at the Oak Ridge Leadership Computing Facility (OLCF), the Lawrence Livermore National Laboratory’s Sierra supercomputer, and the NVIDIA DGX™ A100.
-
Prerequisites
Intermediate experience writing CUDA C/C++ applications
-
Assessment Type
Skills-based coding assessment: Students must refactor a single-GPU 1D wave function solver to be GPU-cluster-ready with NVSHMEM.
-
Certificate
Upon successful completion of the assessment, participants will receive an NVIDIA DLI certificate to recognize their subject matter competency and support professional career growth.
-
Hardware Requirements
You’ll need a desktop or laptop computer capable of running the latest version of Chrome or Firefox. You’ll be provided with dedicated access to a fully configured, GPU-accelerated workstation in the cloud.
Learning Objectives
-
01
Learn several methods for writing multi-GPU CUDA C++ applications
-
02
Use a variety of multi-GPU communication patterns and understand their tradeoffs
-
03
Write portable, scalable CUDA code with the single-program multiple-data (SPMD) paradigm using CUDA-aware MPI and NVSHMEM
-
04
Improve multi-GPU SPMD code with NVSHMEM’s symmetric memory model and its ability to perform GPU-initiated data transfers
-
05
Get practice with common multi-GPU coding paradigms like domain decomposition and halo exchanges
Workshop Outline
Introduction (15 mins) |
|
Multi-GPU Programming Paradigms (120 mins) |
Survey multiple techniques for programming CUDA C++ applications for
multiple GPUs using
a Monte-Carlo approximation of pi CUDA C++ program.
|
Break (60 mins) | |
Introduction to NVSHMEM (120 mins) |
Learn how to write code with NVSHMEM and understand its symmetric memory model.
|
Break (15 mins) | |
Halo Exchanges with NVSHMEM (120 mins) |
Practice common coding motifs like halo exchanges and
domain decomposition using NVSHMEM, and work on the assessment.
|
Final Review (15 mins) |
|