Attention weight (αₜᵢ)

Appears in 2 papers

The probability-like number, between 0 and 1, representing how much the decoder at decoding step t focuses on source position i.

As used in Paper 07 — Neural Machine Translation by Jointly Learning to Align and Translate →

The probability-like number, between 0 and 1, representing how much the decoder at decoding step t focuses on source position i. All attention weights for a given step sum to 1. Computed by applying softmax to the raw alignment scores.

As used in Paper 08 — Attention Is All You Need →

A probability-like value between 0 and 1, produced by softmax from the scaled attention scores. Represents how much one position attends to another. All weights for a given query position sum to 1. The matrix of all attention weights is the (T × T) attention weight matrix A.