KV Cache

Appears in 1 paper

The memory buffer storing Key and Value vectors from all previous tokens during autoregressive (token-by-token) generation.

As used in Paper 18 — Mistral 7B →

The memory buffer storing Key and Value vectors from all previous tokens during autoregressive (token-by-token) generation. In standard Multi-Head Attention, the KV cache grows linearly with sequence length and quadratically with model size (since each token must cache KV pairs for all previous tokens). The KV cache is often the memory bottleneck in LLM inference, not the model weights.