Ring Attention
A distributed attention algorithm where P GPUs are arranged in a ring topology.
A distributed attention algorithm where P GPUs are arranged in a ring topology. Each GPU holds a chunk of the KV cache (n/P tokens). KV chunks circulate around the ring while each GPU computes blockwise attention. After P rounds, every GPU has computed full attention. Enables true long-context (1M+ tokens) through memory and computation distributed across GPUs.