# Lecture 19: Hardware aspects relevant in multi-core, shared memory parallel computing.

## Lecture Summary

* Last time
  * GPU computing via thrust & CUB
* Today
  * Final project proposal discussion
  * Parallel computing on the CPU: Hardware & OpenMP generalities

## Multi-core Parallel Computing with OpenMP

![Opportunities for efficiency gains](https://1313833672-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MMTslgmrrtRXvxD2lk9%2F-MWPm9FYaozb_mgBleSp%2F-MWPrN6gEX0vz3CgMsfo%2FScreen%20Shot%202021-03-22%20at%2011.56.06%20AM.png?alt=media\&token=0f7667ba-91da-4caf-a962-6f289e5d4683)

* OpenMP targets parallelism on SMP architectures
* It is handy when
  * You have a multi-core processor, say 16 cores/socket (go beyond that and we suffer from diminishing returns due to overheads)
  * Might have multiple sockets, say 2
  * You have a good amount of system memory, say 64 GB
* Processes and threads are similar in the sense that they are both independent sequences of execution
  * OpenMP touches on threads, while MPI touches on processes
  * Threads of the same process run in a shared memory space and they have one translation page. Processes, on the other hand, run in separate memory spaces.
* We want to use OpenMP for both data parallelism and task parallelism
  * Data parallelism: The processing of a large amount of data elements can be done in parallel
  * Task parallelism: The execution of a collection of tasks can be performed in parallel

![Hello world for OpenMP](https://1313833672-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MMTslgmrrtRXvxD2lk9%2F-MWPm9FYaozb_mgBleSp%2F-MWPuU_0b0W369Q-EOrI%2FScreen%20Shot%202021-03-22%20at%2012.09.44%20PM.png?alt=media\&token=ab109864-ec0f-45d8-b3ac-8f05e5290c79)

* The OMP parallel region is similar to a CUDA kernel: both are executed by threads
  * A major difference
    * Variables inside GPU kernel are truly local variables, stored in registers
    * OMP variables in a parallel region may or may not be visible to other threads executing the code of the parallel region: the scoping is tricky
* `#include <omp.h>`
* Most OpenMP constructs are compiler directives. In C/C++, they take the form of `pragmas`
* Programming model: A master thread spawns a team of threads

![](https://1313833672-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MMTslgmrrtRXvxD2lk9%2F-MWPm9FYaozb_mgBleSp%2F-MWPv6oLzMgLG4JiWKzg%2FScreen%20Shot%202021-03-22%20at%2012.12.27%20PM.png?alt=media\&token=21db5ce9-d7fe-4a98-b465-d7adb43067c9)
