Lecture 21: OpenMP Work Sharing.
Last updated
Last updated
Last time: OpenMP nested parallelism, work sharing (for loops, sections)
Today
OpenMP: nested parallelism, work sharing (tasks)
OpenMP: variable scoping, synchronization, loose ends
Ending example
Pros: Allows parallelization of irregular problems
Unbounded loops
Recursive algorithms
Producer/consumer
Cons: Relatively tricky to deal with & introduce some overhead
Motivations
OpenMP started to be tailored for large array-based applications
For example, the parallelization of a dynamic list traversal cannot be done in OpenMP for a long time
Storing pointers to list elements in an array: High overhead for array construction (not easy to parallelize)
Using single nowait inside a parallel region: High cost of the single construct. Also, each thread needs to traverse the entire list to determine if another thread has already processed that element
Who does what and when?
The developer
Uses a pragma to specify where & what the tasks are
Ensures that there are no dependencies (that is, tasks can be executed independently)
The OpenMP runtime system
Generates a new task whenever a thread encounters a task construct
Decide the moment of execution (can be immediate or delayed)
Definition: A task is a specific instance/combo of executable code along w/ its data environment (the shared & private data manipulated by the task) and ICV (internal control variables: thread scheduling and environment variables, typically associated with OpenMP)
Synchronization issues. Solution: use task barriers (#pragma omp barrier
, #pragma omp taskwait
) to ensure the completion of tasks.
Threads have access to a pool of memory that is shared
Threads can also have private data
Basic rule: Any variable declared prior to a parallel region is shared in that parallel region
The private clause reproduces for each thread variables declared private in the pragma
There are also OpenMP variables treated as private by default
Stack (local) variables in functions called from within parallel regions
Loop iteration variables
Automatic variables within a statement block
When in doubt, always explicitly indicate something to be private
firstprivate: Specifies that each thread should have its own instance of a variable. Moreover, the variable is initializes using the value of the variable of the same name from the master thread
Usage: #pragma omp parallel num_threads(4) firstprivate(i)
lastprivate: The enclosing context's version of the variable is set equal to the private version of whichever thread executes the final iteration of the work-sharing construct (for or section)
Data scoping is a common source of errors in OpenMP. It is the programmer's responsibility to make sure data dependencies do not lead to race conditions
Explicit barrier: #pragma omp barrier
Implicit barriers: parallel, for, single, sections
Unnecessary barriers hurt performance and can be removed with the nowait clause (applicable to for, single, sections)