Lecture 23: OpenMP NUMA Aspects. Caching and OpenMP.

Lecture Summary

Last time
- Wrap up synchronization
- OpenMP rules of thumb
- Parallel computing with OpenMP: NUMA aspects
Today
- Parallel computing, multi-core: how caches come into play
- Critical thinking, and similar tricks for speeding up your code

Consistency vs. Coherence
- Consistency establishes a set of rules that governs the collective actions of the threads relative to the entire system memory
  - Think of it this way: there are at least two memory entries that come up in the discussion
- Coherence regards expected behavior that one memory location must display relative to transactions carried out by multiple threads running on multiple cores
  - Think of it this way: there is exactly one memory entry that comes up in the discussion
Two established approaches for enforcing cache coherence
- Directory-based: Directory acts as a filter through which any change to cache must pass. When an entry is changed, the directory either updates or invalidates the other caches with that entry
- Snooping-based
  - Example: MESI protocol
    4 states: modified, exclusive, shared, invalid
Assume each cache line can only exist in one of 3 states
- Exclusive: the only valid copy in any cache
- Read-only: A valid copy but other caches may contain it
- Invalid: Out of date and cannot be used
In this simplified coherency model,
- A read on an invalid or absent cache line will be cached as read-only or exclusive
- A write on a line not in an exclusive state will cause all other copies to be marked invalid and the written line to be marked exclusive

Each cache line is typically 64 bytes long and can store, for instance, 8 doubles or 16 ints. As soon as one entry in a cache line is changed, all the other values in cache line get dirty.
False sharing happens when two threads are both writing into different locations within the same cache line
Symptoms: Poor performance, high numbers of cache misses, unexpected load imbalance

This module brings together knowledge about
- Compilers and how they work
- Memory aspects: Pointers, hierarchy, latencies, bandwidths
- Instruction Level Parallelism (pipelining, jump instructions, branch prediction, wide registers, etc.)

Last updated 4 years ago

Was this helpful?