- 0 kr
Watch this product and we will notify you once it is back in stock.
Some algorithms are massively parallel, for instance applying filters to images, convolutions, matrix operations, particle based physics simulations, evaluating neural networks. These algorithms can be massively accelerated by executing them on the GPU instead of the CPU. In this course we use Microsoft AMP for writing programs that perform their computations on the GPU.
This course goes close to the metal and as such we focus a fair bit on GPU hardware architecture, since the main goal of GPU programming is to get better performance we need to know which patterns that work best and how to debug and optimize for GPU computing.
Microsoft AMP is a C++ extension that adds some keywords to the C++ language for marking functions that should execute on the GPU. This allows us to write code that looks like C++ and hits a sweet spot between development cost and performance.
Experienced C++ Developers
Deep knowledge of multi-threaded programming using threads. Knowledge of task based concurrency for instance with Microsoft PPL or Intel TBB.
What you will learn
- How to avoid common optimization pitfalls.
- When to benefit from parallelism.
- How the underlying hardware contributes to parallelism.
- How to take advantage of the GPU across multiple manufacturers with C++ and Microsoft AMP.
- How to avoid common parallel and heterogeneous computing pitfalls.
- C++11 Lambdas
- Measuring Performance
- Introduction to CPU and GPU Hardware
- Memory Types and Caching
- Vector programming
- Cores, Threads, Tiles and Warps
- Methods of writing code for the GPU
- Microsoft C++ AMP
- Introduction to AMP
- AMP Syntax and Data Types
- array, array_view
- How to use
- Optimizing Memory Move and Copy
- Synchronizing memory with accelerators
- Implicit synchronization
- Lost Exceptions
- The fast_math and precise_math namespaces
- Comparison to “standard” math.
- Accelerator requirements
- Debugging with Warp
- Visual Studio Tools
- GPU Threads
- Parallel Stacks
- Parallel Watch
- Floating Point Numbers
- How they are handled
- Why they are different from CPU
- Performance of float/double operations
- Determining tile size
- Memory Coalescence
- Memory Collisions
- Tile Synchronization
- AMP Atomic Operations
- Parallel patterns with AMP
- AMP Accelerators
- Accelerator properties
- Shared memory
- Using multiple accelerators
- The concurrency::graphics namespace
- Exploiting the texture cache
- AMP Error Handling
- Detecting and Recovering from TDR