Encoder
The first half of the seq2seq architecture.
The first half of the seq2seq architecture. An RNN/LSTM that reads the
A Transformer component that processes the full input sequence with no causal masking. Every token can attend to every other token. Contrast with the decoder, which masks future tokens. BERT uses only encoders.