Inference vs Training
Inference: generating tokens one-by-one (autoregressive).
Inference: generating tokens one-by-one (autoregressive). Training: processing full sequences simultaneously (non-autoregressive). Ring Attention applies to both, but inference efficiency is higher due to longer context enabling better predictions.