Lecture 27: MPI Parallel Programming Point-to-Point communication: Blocking vs. Non-blocking sends.
Lecture Summary
- Last time - HPC via MPI 
- MPI point-to-point communication: The blocking flavor 
 
- Today - Wrap up point-to-point communication 
- Collective communication 
 
Point-to-point communication
- Different "send" modes: - Synchronous send: MPI_SSEND - Risk of deadlock/waiting -> idle time 
- High latency but better bandwidth than bsend 
 
- Buffered (async) send: MPI_BSEND - Low latency/bandwidth 
 
- Standard send: MPI_SEND - Up to the MPI implementation to device whether to do rendezvous or eager 
- Less overhead if in eager mode 
- Blocks in rendezvous, switches to sync mode 
 
- Ready send: MPI_RSEND - Works only if the matching receive has been posted 
- Rarely used, very dangerous 
 
 
- Receiving, all modes: MPI_RECV 
- Buffered send - Reduces overhead associated with data transmission 
- Relies on the existence of a buffer. Buffering incurs an extra memory copy 
- Return from an MPI_Bsend does not guarantee the message was sent: the message remains in the buffer until a matching receive is posted 
 


Non-blocking point-to-point
- Blocking send: Covered above. Upon return from a send, you can modify the content of the buffer in which you stored data to be sent since the data has been sent 
- Non-blocking send: The sender returns immediately, no guarantee that the data has been transmitted - Routine name starts with MPI_I 
- Gets to do useful work (overlap communication with execution) upon return from the non-blocking call 
- Use synchronization call to wait for communication to complete 
 
- MPI_Wait: Blocks until a certain request is completed - Wait for multiple sends: Waitall, Waitany, Waitsome 
 
- MPI_Test: Non-blocking, returns quickly with status information - int MPI_Test(MPI_Request *request, int *flag, MPI_Status *status); 
 
- MPI_Probe: Allows for incoming messages to be queried prior to receiving them 

Collective communications
- Three types of collective actions: - Synchronization (barrier) 
- Communication (e.g., broadcast) 
- Operation (e.g., reduce) 
 
- Writing distributed applications with PyTorch is a good tutorial 
- Broadcast: MPI_Bcast 
- Gather: MPI_Gather 
- Scatter: MPI_Scatter 
- Reduce: MPI_Reduce - Result is collected by the root only 
 
- Allreduce: MPI_Allreduce - Result is sent out to all ranks in the communicator 
 
- Prefix scan: MPI_Scan 
- User-defined reduction operations: Register using MPI_Op_create() 


Last updated
Was this helpful?
