# Lecture 18: GPU Computing with thrust and cub.

## Lecture Summary

* Last time
  * A three-stop journey noted in the evolution of the CUDA memory model
    * Z-C accesses on the host; the UVA milestone; the unified memory model that allowed to use of managed memory
* Today
  * GPU computing, from a distance (via thrust & CUB)

## Thrust

* Motivation
  * Increase programmer productivity
  * Do not sacrifice execution speed
* What is thrust?
  * A template library for parallel computing on GPU and CPU
  * Heavy use of C++ containers
  * Provides ready-to-use algorithms

### Namespaces, containers, iterators

* To avoid name collisions, use `thrust` vs. `std` namespaces
* 2 vector containers: host\_vector and device\_vector
  * Just like those in the C++ STL
  * Manage both host & device memory
  * Auto allocation & deallocation
* Iterators: Act like pointers for vector containers
  * Can be converted to raw containers
  * Raw pointers can also be wrapped with device\_ptr

### Algorithms

* Element-wise operations
  * for\_each, transform, gather, scatter
  * Example: SAXPY, **functor** using transform
* Reductions
  * reduce, inner\_product, reduce\_by\_key
* Prefix sums (scans)
  * inclusive\_scan, inclusive\_scan\_by\_key
* Sorting
  * sort, stable\_sort, sort\_by\_key

### General transformations. Zipping & fusing

![Problem at hand](/files/-MVK41A14RhCUuSPp4EQ)

* **Zipping**
  * Takes in multiple distinct sequences, zips into unique sequence of tuples
* **Fusing**
  * Just like zipping, but it's for reorganizing computation (instead of data) for efficient thrust processing
  * Increases the arithmetic intensity

### Thrust example: Processing rainfall data

Not covered in class

## CUB

* CUB: CUDA UnBound
* [CUB is on GitHub](https://github.com/NVIDIA/cub)
* thrust is built on top of CUB
* What CUB does
  * Parallel primitives
    * Warp-wide "collective" primitives
    * Block-wide "collective" primitives
    * Device-wide primitives
  * Utilities
    * Fancy iterators
    * Thread and thread block I/O
    * PTX intrinsics
    * Device, kernel, and storage management


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://blog.ruipan.xyz/earlier-readings-and-notes/cs759-hpc-course-notes/lecture-18-gpu-computing-with-thrust-and-cub..md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
