SRAM (Static RAM)
Tiny, ultra-fast on-GPU cache (e.g., 192KB per core).
Tiny, ultra-fast on-GPU cache (e.g., 192KB per core). All computation happens here. If your algorithm doesn't fit in SRAM, you pay HBM-access penalties. Flash Attention and Mamba both optimise for this.