# Lecture 19: Hardware aspects relevant in multi-core, shared memory parallel computing.

## Lecture Summary

* Last time
  * GPU computing via thrust & CUB
* Today
  * Final project proposal discussion
  * Parallel computing on the CPU: Hardware & OpenMP generalities

## Multi-core Parallel Computing with OpenMP

![Opportunities for efficiency gains](/files/-MWPrN6gEX0vz3CgMsfo)

* OpenMP targets parallelism on SMP architectures
* It is handy when
  * You have a multi-core processor, say 16 cores/socket (go beyond that and we suffer from diminishing returns due to overheads)
  * Might have multiple sockets, say 2
  * You have a good amount of system memory, say 64 GB
* Processes and threads are similar in the sense that they are both independent sequences of execution
  * OpenMP touches on threads, while MPI touches on processes
  * Threads of the same process run in a shared memory space and they have one translation page. Processes, on the other hand, run in separate memory spaces.
* We want to use OpenMP for both data parallelism and task parallelism
  * Data parallelism: The processing of a large amount of data elements can be done in parallel
  * Task parallelism: The execution of a collection of tasks can be performed in parallel

![Hello world for OpenMP](/files/-MWPuU_0b0W369Q-EOrI)

* The OMP parallel region is similar to a CUDA kernel: both are executed by threads
  * A major difference
    * Variables inside GPU kernel are truly local variables, stored in registers
    * OMP variables in a parallel region may or may not be visible to other threads executing the code of the parallel region: the scoping is tricky
* `#include <omp.h>`
* Most OpenMP constructs are compiler directives. In C/C++, they take the form of `pragmas`
* Programming model: A master thread spawns a team of threads

![](/files/-MWPv6oLzMgLG4JiWKzg)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://blog.ruipan.xyz/earlier-readings-and-notes/cs759-hpc-course-notes/lecture-19-hardware-aspects-relevant-in-multi-core-shared-memory-parallel-computing..md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
