CtrlK

High Performance Computing Course Notes

ECE/ME/EMA/CS 759: High Performance Computing for Engineering Applications, Spring 2021 by Prof. Dan Negrut

Acknowledgments

All slides/files linked are accessible on Box using a UW-Madison account
Almost every figure and piece of code in these notes is excerpted from Prof. Dan Negrut's course slides. Some of the slides are taken from other places by Prof. Negrut -- he cited those in his slides.
Slides for ME759 (of the whole semester)
Slides from ME459 (Computing Concepts for Applications in Engineering)

Table of Contents

Date

Title

Recommended Readings

1/25

Lecture 1: Course Overview

Basic Linux Command Line Usage; Slurm usage (ME459 p95-97)

1/27

Lecture 2: From Code to Instructions. The FDX Cycle. Instruction Level Parallelism.

C recap (ME459 p114-); Euler usage

1/29

Lecture 3: Superscalar architectures. Measuring Computer Performance. Memory Aspects.

gdb recap (ME459 p649-); Ch.5 of the C book

2/1

Lecture 4: The memory hierarchy. Caches.

Build mgmt & cmake (ME459 p354-)

2/3

Lecture 5: Caches, wrap up. Virtual Memory.

Git (ME459 p449-); How to Write a Git Commit

2/5

Lecture 6: The Walls to Sequential Computing. Moore’s Law.

Validity of the single processor approach to achieving large scale computing capabilities (Amdahl, '67)

2/8

Lecture 7: Parallel Computing. Flynn’s Taxonomy. Amdahl’s Law.

Structured Programming w/ go to Statements (Knuth, '74)

2/10

Lecture 8: GPU Computing Intro. The CUDA Programming Model. CUDA Execution Configuration

Modern Microprocessors: A 90-Minute Guide (Patterson, '01)

2/12

Lecture 9: GPU Memory Spaces.

Optimizations in C++ Compilers (Godbolt, 2019)

2/15

Lecture 10: GPU Scheduling Issues.

NVIDIA Tesla Architecture

2/17

Lecture 11: Execution Divergence. Control Flow in CUDA. CUDA Shared Memory Issues.

CUDA C++ Programming Guide

2/19

Lecture 12: Global Memory Access Patterns and Implications.

The GPU Computing Era (Nickolls & Dally, '10)

2/22

Lecture 13: Atomic operations in CUDA. GPU ode optimization rules of thumb.

Unified Memory in CUDA 6: A Brief Overview

2/24

Lecture 14: CUDA Case Studies. (1) 1D Stencil Operation. (2) Vector Reduction in CUDA

Maximizing Unified Memory Performance in CUDA (Sakharnykh, '17)

2/26

Lecture 15: CUDA Case Studies. (3) Parallel Prefix Scans on the GPU. Using Multiple Streams in CUDA.

Titles of GTC '21 Talks

3/1

Lecture 16: Streams, and overlapping data copy with execution.

Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking (Citadel, '18); CUDA C++ Best Practices Guide

3/3

Lecture 17: GPU Computing: Advanced Features.

GTC '18 Talk on Unified Memory

3/5

Lecture 18: GPU Computing with thrust and cub.

Thrust: A Productivity-Oriented Library for CUDA (Bell & Hoberock, '11)

3/8

Lecture 19: Hardware aspects relevant in multi-core, shared memory parallel computing.

Unified Memory in CUDA 6: A Brief Overview and Related Data Access/Transfer Issues (by Dan and some other guys! '14)

3/10

Lecture 20: Multi-core Parallel Computing with OpenMP. Parallel Regions.

Cache Coherence on Power 9 - Volta systems w/ NVLINK2

3/12

Lecture 21: OpenMP Work Sharing.

Node-Level Performance Engineering (SC '19)

3/15

Lecture 22: OpenMP Work Sharing.

Advanced OpenMP: Performance and 5.0 Features (SC '19)

3/17

Lecture 23: OpenMP NUMA Aspects. Caching and OpenMP.

Mastering Tasking with OpenMP (SC '19)

3/19

Lecture 24: Critical Thinking. Code Optimization Aspects.

Ch. 12 of Optimizing Software in C++

3/22

Lecture 25: Computing with Supercomputers.

3/24

Lecture 26: MPI Parallel Programming General Introduction. Point-to-Point Communication.

HPC Perspectives (Dongarra et. al., '05)

3/26

Lecture 27: MPI Parallel Programming Point-to-Point communication: Blocking vs. Non-blocking sends.

Advanced MPI Programming (SC '19)

3/29

Lecture 28: MPI Parallel Programming: MPI Collectives. Overview of topics covered in the class.

Previous[2021 EuroSys] NextDoor: Accelerating graph sampling for graph machine learning using GPUs NextLecture 1: Course Overview

Last updated 4 years ago

Was this helpful?