# Lecture 20: Multi-core Parallel Computing with OpenMP. Parallel Regions.

## Lecture Summary

* Last time: OpenMP generalities
* This time: OpenMP nuts & bolts

## OpenMP

![Compiler directives examples (the directive goes behind \`#pragma omp\`)](https://1313833672-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MMTslgmrrtRXvxD2lk9%2F-MWPw-i2II2NYWBtAkg4%2F-MWPxtwjfZnxLR5FZYs_%2FScreen%20Shot%202021-03-22%20at%2012.24.37%20PM.png?alt=media\&token=9e5c8306-07bb-41f3-bdc3-c03f6a7a9321)

![User-level run time routines](https://1313833672-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MMTslgmrrtRXvxD2lk9%2F-MWPw-i2II2NYWBtAkg4%2F-MWPyGK8qdfNVpiiy1An%2FScreen%20Shot%202021-03-22%20at%2012.26.12%20PM.png?alt=media\&token=9324cc76-94f7-43e9-abab-dca5680e613f)

![Environment variables. This helps with bypassing the run-time function calls, but using env vars does not allow for dynamic OpenMP behavior. A function call overrides an env var setting, though.](https://1313833672-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MMTslgmrrtRXvxD2lk9%2F-MWPw-i2II2NYWBtAkg4%2F-MWPzoPbfOsAB5BoHFZR%2FScreen%20Shot%202021-03-22%20at%2012.32.59%20PM.png?alt=media\&token=4cc76663-b1c6-4d8e-9beb-002679680e0b)

* OpenMP: portable and scalable model for shared memory parallel applications
  * No need to dive deep and work with POSIX pthreads
  * Under the hood, the compiler translates OpenMPfunctions and directives to pthread calls
* Structured block and OpenMP construct are the two sides of the “parallel region” coin
* In a structured block, the only "branches" allowed are exit() function calls. There is an implicit barrier after each structured block where threads wait on each other.

![](https://1313833672-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MMTslgmrrtRXvxD2lk9%2F-MWPw-i2II2NYWBtAkg4%2F-MWQ3xI0kcbZyEtNFq20%2FScreen%20Shot%202021-03-22%20at%2012.55.23%20PM.png?alt=media\&token=1adcfa60-c26e-40f1-ac54-55b3fdceed31)

![](https://1313833672-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MMTslgmrrtRXvxD2lk9%2F-MWPw-i2II2NYWBtAkg4%2F-MWQ4KyzoRLHdvtw2S0e%2FScreen%20Shot%202021-03-22%20at%2012.57.03%20PM.png?alt=media\&token=1283fb53-6bfe-4439-9adc-11c7876b14e8)

### Nested Parallelism

![](https://1313833672-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MMTslgmrrtRXvxD2lk9%2F-MWPw-i2II2NYWBtAkg4%2F-MWQ4wpwOB2h9VcxWNQG%2FScreen%20Shot%202021-03-22%20at%2012.59.46%20PM.png?alt=media\&token=c92fad9d-60af-4c98-8dc2-a8eb1e015759)

* The nested parallelism behavior can be controlled by using the OpenMP API
* The single directive identifies a section of the code that must be run by a single thread
  * The difference between single and master is that in single, the code is executed by whichever thread reaches the region first
  * Another diff is that for single, there is an implicit barrier upon completion of the region

![](https://1313833672-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MMTslgmrrtRXvxD2lk9%2F-MWPw-i2II2NYWBtAkg4%2F-MWQ6O-K0uwovOYdkqGB%2FScreen%20Shot%202021-03-22%20at%201.06.02%20PM.png?alt=media\&token=6fe05aea-5bd2-4f67-b097-06d1c8ec2c7a)

### Work Sharing

* Work sharing is a general term used in OpenMP to describe the distribution of work across threads
* The three main constructs for automatic work division are:
  * omp for
  * omp sections
  * omp task

### omp for

![](https://1313833672-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MMTslgmrrtRXvxD2lk9%2F-MWPw-i2II2NYWBtAkg4%2F-MWQ7rSOacz3mKGhmAWW%2FScreen%20Shot%202021-03-22%20at%201.12.29%20PM.png?alt=media\&token=6b69afc1-b196-483f-85bb-883cb417195a)

* A #pragma omp for inside a #pragma omp parallel is equivalent to #pragma omp parallel for
* Most OpenMP implementations use default block partitioning, where each thread is assigned roughly n/thread\_count iterations. This may lead to load imbalance if the work per iteration varies
  * The schedule clause comes to the rescue!
  * Usage example: #pragma omp parallel for schedule(static, 8)

![](https://1313833672-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MMTslgmrrtRXvxD2lk9%2F-MWPw-i2II2NYWBtAkg4%2F-MWQ9Op2h05CJX-260_A%2FScreen%20Shot%202021-03-22%20at%201.19.13%20PM.png?alt=media\&token=0b770d70-934b-42b9-9ac5-58c5d8247e6f)

![Effects of different schedules, assuming 3 threads](https://1313833672-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MMTslgmrrtRXvxD2lk9%2F-MWPw-i2II2NYWBtAkg4%2F-MWQ9lQXTJC4JTmbSkAx%2FScreen%20Shot%202021-03-22%20at%201.20.49%20PM.png?alt=media\&token=0aa1dc01-ea96-4775-943d-d865dc4a41f0)

![Choosing a schedule](https://1313833672-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MMTslgmrrtRXvxD2lk9%2F-MWPw-i2II2NYWBtAkg4%2F-MWQ9un8IUUZ2x-N6Amt%2FScreen%20Shot%202021-03-22%20at%201.21.28%20PM.png?alt=media\&token=825eb673-7b02-417e-8e73-24f41e384c5e)

* OpenMP will only parallelize for loops that are in canonical form. Counterintuitive behavior may happen
* The collapse clause supports collapsing the embedded loops into one uber loop
  * For example, if the outer loop has 10 iters, the inner loop has 10^7 iters, and we have 32 threads: parallelizing the outer loop is bad (10<32), parallelizing the inner loop is good, but we can do better using collapse

![](https://1313833672-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MMTslgmrrtRXvxD2lk9%2F-MWPw-i2II2NYWBtAkg4%2F-MWQBmo703WkJX7H36YC%2FScreen%20Shot%202021-03-22%20at%201.29.16%20PM.png?alt=media\&token=df0433a5-1761-4d22-b33b-0c8c78fde6a9)

### omp sections

![](https://1313833672-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MMTslgmrrtRXvxD2lk9%2F-MWPw-i2II2NYWBtAkg4%2F-MWQC0wYU64VC7X_6cjq%2FScreen%20Shot%202021-03-22%20at%201.30.42%20PM.png?alt=media\&token=c36b11a6-ed10-4cff-a080-ef425dd7ef48)

![](https://1313833672-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MMTslgmrrtRXvxD2lk9%2F-MWPw-i2II2NYWBtAkg4%2F-MWQCApH6GluvFoFZZLA%2FScreen%20Shot%202021-03-22%20at%201.31.20%20PM.png?alt=media\&token=896d4c2a-14ac-4cf6-b53e-a938cec57978)

![](https://1313833672-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MMTslgmrrtRXvxD2lk9%2F-MWPw-i2II2NYWBtAkg4%2F-MWQCaCEH-vq5kTPLx-f%2FScreen%20Shot%202021-03-22%20at%201.33.09%20PM.png?alt=media\&token=9436e2ac-6b94-4400-9cfa-05304b89211d)

![](https://1313833672-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MMTslgmrrtRXvxD2lk9%2F-MWPw-i2II2NYWBtAkg4%2F-MWQCcvbiRaXqkfBcdvx%2FScreen%20Shot%202021-03-22%20at%201.33.20%20PM.png?alt=media\&token=c5b12c7e-2387-41c1-acf0-1b9be305f60f)
