Numerical Stability

Appears in 1 paper

Ensuring computed values don't overflow, underflow, or lose precision.

As used in Paper 19 — Ring Attention with Blockwise Transformers for Near-Infinite Context →

Ensuring computed values don't overflow, underflow, or lose precision. Online softmax is numerically more stable than naive blockwise softmax. Important for correctness in Ring Attention, especially with float16.