Online Softmax
An incremental softmax computation (using logsumexp trick) that maintains running statistics (max, sum of exponentials) as you process blocks.
An incremental softmax computation (using logsumexp trick) that maintains running statistics (max, sum of exponentials) as you process blocks. Allows numerically stable softmax computation across blocks without recomputing from scratch. Critical for correctness in Ring Attention.