Compute-Communication Overlap
The simultaneous execution of computation and communication.
The simultaneous execution of computation and communication. While GPU i computes blockwise attention, it also sends its KV chunk to GPU i+1 and receives the next chunk from GPU i-1. Network latency is hidden if compute time ≥ communication time.