# Lecture 23: OpenMP NUMA Aspects. Caching and OpenMP.

## Lecture Summary

* Last time
  * Wrap up synchronization
  * OpenMP rules of thumb
  * Parallel computing with OpenMP: NUMA aspects
* Today
  * Parallel computing, multi-core: how caches come into play
  * Critical thinking, and similar tricks for speeding up your code

## Caches in a Multi-Core Setup

* Consistency vs. Coherence
  * Consistency establishes a set of rules that governs the collective actions of the threads relative to the entire system memory
    * Think of it this way: there are at least two memory entries that come up in the discussion
  * Coherence regards expected behavior that one memory location must display relative to transactions carried out by multiple threads running on multiple cores
    * Think of it this way: there is exactly one memory entry that comes up in the discussion
* Two established approaches for enforcing cache coherence
  * Directory-based: Directory acts as a filter through which any change to cache must pass. When an entry is changed, the directory either updates or invalidates the other caches with that entry
  * Snooping-based
    * Example: MESI protocol
      * 4 states: modified, exclusive, shared, invalid
* Assume each cache line can only exist in one of 3 states
  * Exclusive: the only valid copy in any cache
  * Read-only: A valid copy but other caches may contain it
  * Invalid: Out of date and cannot be used
* In this simplified coherency model,&#x20;
  * A read on an invalid or absent cache line will be cached as read-only or exclusive
  * A write on a line not in an exclusive state will cause all other copies to be marked invalid and the written line to be marked exclusive

![Snooping-based](/files/-MXylVuz9izNRNMKdG5D)

![4 states of the MESI protocol](/files/-MXym6YPakEnmkIjfeE5)

### False Sharing

* Each cache line is typically 64 bytes long and can store, for instance, 8 doubles or 16 ints. As soon as one entry in a cache line is changed, all the other values in cache line get dirty.
* False sharing happens when two threads are both writing into different locations within the same cache line
* Symptoms: Poor performance, high numbers of cache misses, unexpected load imbalance

## Critical Thinking

* This module brings together knowledge about
  * Compilers and how they work
  * Memory aspects: Pointers, hierarchy, latencies, bandwidths
  * Instruction Level Parallelism (pipelining, jump instructions, branch prediction, wide registers, etc.)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://blog.ruipan.xyz/earlier-readings-and-notes/cs759-hpc-course-notes/lecture-23-openmp-numa-aspects.-caching-and-openmp..md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
