# Lecture 20: Multi-core Parallel Computing with OpenMP. Parallel Regions.

## Lecture Summary

* Last time: OpenMP generalities
* This time: OpenMP nuts & bolts

## OpenMP

![Compiler directives examples (the directive goes behind \`#pragma omp\`)](/files/-MWPxtwjfZnxLR5FZYs_)

![User-level run time routines](/files/-MWPyGK8qdfNVpiiy1An)

![Environment variables. This helps with bypassing the run-time function calls, but using env vars does not allow for dynamic OpenMP behavior. A function call overrides an env var setting, though.](/files/-MWPzoPbfOsAB5BoHFZR)

* OpenMP: portable and scalable model for shared memory parallel applications
  * No need to dive deep and work with POSIX pthreads
  * Under the hood, the compiler translates OpenMPfunctions and directives to pthread calls
* Structured block and OpenMP construct are the two sides of the “parallel region” coin
* In a structured block, the only "branches" allowed are exit() function calls. There is an implicit barrier after each structured block where threads wait on each other.

![](/files/-MWQ3xI0kcbZyEtNFq20)

![](/files/-MWQ4KyzoRLHdvtw2S0e)

### Nested Parallelism

![](/files/-MWQ4wpwOB2h9VcxWNQG)

* The nested parallelism behavior can be controlled by using the OpenMP API
* The single directive identifies a section of the code that must be run by a single thread
  * The difference between single and master is that in single, the code is executed by whichever thread reaches the region first
  * Another diff is that for single, there is an implicit barrier upon completion of the region

![](/files/-MWQ6O-K0uwovOYdkqGB)

### Work Sharing

* Work sharing is a general term used in OpenMP to describe the distribution of work across threads
* The three main constructs for automatic work division are:
  * omp for
  * omp sections
  * omp task

### omp for

![](/files/-MWQ7rSOacz3mKGhmAWW)

* A #pragma omp for inside a #pragma omp parallel is equivalent to #pragma omp parallel for
* Most OpenMP implementations use default block partitioning, where each thread is assigned roughly n/thread\_count iterations. This may lead to load imbalance if the work per iteration varies
  * The schedule clause comes to the rescue!
  * Usage example: #pragma omp parallel for schedule(static, 8)

![](/files/-MWQ9Op2h05CJX-260_A)

![Effects of different schedules, assuming 3 threads](/files/-MWQ9lQXTJC4JTmbSkAx)

![Choosing a schedule](/files/-MWQ9un8IUUZ2x-N6Amt)

* OpenMP will only parallelize for loops that are in canonical form. Counterintuitive behavior may happen
* The collapse clause supports collapsing the embedded loops into one uber loop
  * For example, if the outer loop has 10 iters, the inner loop has 10^7 iters, and we have 32 threads: parallelizing the outer loop is bad (10<32), parallelizing the inner loop is good, but we can do better using collapse

![](/files/-MWQBmo703WkJX7H36YC)

### omp sections

![](/files/-MWQC0wYU64VC7X_6cjq)

![](/files/-MWQCApH6GluvFoFZZLA)

![](/files/-MWQCaCEH-vq5kTPLx-f)

![](/files/-MWQCcvbiRaXqkfBcdvx)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://blog.ruipan.xyz/earlier-readings-and-notes/cs759-hpc-course-notes/lecture-20-multi-core-parallel-computing-with-openmp.-parallel-regions..md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
