Compute-Communication Overlap

Appears in 1 paper

The simultaneous execution of computation and communication.

As used in Paper 19 — Ring Attention with Blockwise Transformers for Near-Infinite Context →

The simultaneous execution of computation and communication. While GPU i computes blockwise attention, it also sends its KV chunk to GPU i+1 and receives the next chunk from GPU i-1. Network latency is hidden if compute time ≥ communication time.