Online Softmax

Appears in 1 paper

An incremental softmax computation (using logsumexp trick) that maintains running statistics (max, sum of exponentials) as you process blocks.

As used in Paper 19 — Ring Attention with Blockwise Transformers for Near-Infinite Context →

An incremental softmax computation (using logsumexp trick) that maintains running statistics (max, sum of exponentials) as you process blocks. Allows numerically stable softmax computation across blocks without recomputing from scratch. Critical for correctness in Ring Attention.