Layer Normalisation (Layer Norm)
Applied after each sub-layer.
Applied after each sub-layer. Normalises a single position's d_model-dimensional vector to mean 0, std 1, then rescales by learned γ and β. Works position-independently, so it is compatible with variable-length sequences and batch-size-1 inference. See the Normalisation tutorial.