HBM (High Bandwidth Memory)
High-capacity GPU memory (e.g., 80GB on an H100), with lower bandwidth than SRAM.
High-capacity GPU memory (e.g., 80GB on an H100), with lower bandwidth than SRAM. Modern transformers and Mamba spend time shuttling data between HBM and faster SRAM. Efficient algorithms minimise HBM-SRAM communication.