Query Head

Appears in 1 paper

In multi-head attention, one of n_heads independent attention mechanisms.

As used in Paper 18 — Mistral 7B →

In multi-head attention, one of n_heads independent attention mechanisms. Each query head learns to attend to different patterns in the input. In GQA, multiple query heads can share a single KV head, reducing memory without completely losing parallel attention patterns.