Rotary Position Embeddings (RoPE)

Appears in 1 paper

A method of encoding token position information by rotating query and key vectors.

As used in Paper 18 — Mistral 7B →

A method of encoding token position information by rotating query and key vectors. Mistral uses RoPE instead of absolute position embeddings. RoPE is more efficient and generalises better to sequence lengths longer than training sequences (though Mistral's SWA means extrapolation is still limited).