Blockwise Attention

Appears in 1 paper

Computing attention in blocks (query chunk × KV chunk) rather than all-at-once.

As used in Paper 19 — Ring Attention with Blockwise Transformers for Near-Infinite Context →

Computing attention in blocks (query chunk × KV chunk) rather than all-at-once. Numerically equivalent to full attention when using online softmax. Essential for Ring Attention, where each GPU processes its Q chunk against KV chunks sequentially.