Encoder-decoder
The Transformer's two-part structure for seq2seq tasks (e.g., translation).
The Transformer's two-part structure for seq2seq tasks (e.g., translation). The encoder processes the full source sequence with self-attention (all positions can see all others). The decoder generates the target sequence autoregressively, attending to both its own previous outputs (masked self-attention) and the encoder's outputs (cross-attention).