Lecture 23: OpenMP NUMA Aspects. Caching and OpenMP.

Lecture Summary

  • Last time

    • Wrap up synchronization

    • OpenMP rules of thumb

    • Parallel computing with OpenMP: NUMA aspects

  • Today

    • Parallel computing, multi-core: how caches come into play

    • Critical thinking, and similar tricks for speeding up your code

Caches in a Multi-Core Setup

  • Consistency vs. Coherence

    • Consistency establishes a set of rules that governs the collective actions of the threads relative to the entire system memory

      • Think of it this way: there are at least two memory entries that come up in the discussion

    • Coherence regards expected behavior that one memory location must display relative to transactions carried out by multiple threads running on multiple cores

      • Think of it this way: there is exactly one memory entry that comes up in the discussion

  • Two established approaches for enforcing cache coherence

    • Directory-based: Directory acts as a filter through which any change to cache must pass. When an entry is changed, the directory either updates or invalidates the other caches with that entry

    • Snooping-based

      • Example: MESI protocol

        • 4 states: modified, exclusive, shared, invalid

  • Assume each cache line can only exist in one of 3 states

    • Exclusive: the only valid copy in any cache

    • Read-only: A valid copy but other caches may contain it

    • Invalid: Out of date and cannot be used

  • In this simplified coherency model,

    • A read on an invalid or absent cache line will be cached as read-only or exclusive

    • A write on a line not in an exclusive state will cause all other copies to be marked invalid and the written line to be marked exclusive

False Sharing

  • Each cache line is typically 64 bytes long and can store, for instance, 8 doubles or 16 ints. As soon as one entry in a cache line is changed, all the other values in cache line get dirty.

  • False sharing happens when two threads are both writing into different locations within the same cache line

  • Symptoms: Poor performance, high numbers of cache misses, unexpected load imbalance

Critical Thinking

  • This module brings together knowledge about

    • Compilers and how they work

    • Memory aspects: Pointers, hierarchy, latencies, bandwidths

    • Instruction Level Parallelism (pipelining, jump instructions, branch prediction, wide registers, etc.)

Last updated