SRAM (Static RAM)

Appears in 1 paper

Tiny, ultra-fast on-GPU cache (e.g., 192KB per core).

As used in Paper 21 — Mamba: Linear-Time Sequence Modeling with Selective State Spaces →

Tiny, ultra-fast on-GPU cache (e.g., 192KB per core). All computation happens here. If your algorithm doesn't fit in SRAM, you pay HBM-access penalties. Flash Attention and Mamba both optimise for this.

Paper 21 — Mamba: Linear-Time Sequence Modeling with Selective State Spaces →

Appears in papers