Cross-attention
Attention where the query comes from one sequence (the decoder) and the keys and values come from another sequence (the encoder).
Attention where the query comes from one sequence (the decoder) and the keys and values come from another sequence (the encoder). Bahdanau attention is cross-attention. Distinct from self-attention (Paper 08), where all three come from the same sequence.
Attention where the Query comes from the decoder and the Keys and Values come from the encoder. Identical in mechanism to Bahdanau attention, but computed in matrix form for all decoder positions simultaneously. Replaces the step-by-step attention computation of Paper 07.