Layer Normalisation (Layer Norm)

Appears in 1 paper

Applied after each sub-layer.

As used in Paper 08 — Attention Is All You Need →

Applied after each sub-layer. Normalises a single position's d_model-dimensional vector to mean 0, std 1, then rescales by learned γ and β. Works position-independently, so it is compatible with variable-length sequences and batch-size-1 inference. See the Normalisation tutorial.

Paper 08 — Attention Is All You Need →

Appears in papers