Encoder

Appears in 2 papers

The first half of the seq2seq architecture.

As used in Paper 06 — Sequence to Sequence Learning with Neural Networks →

The first half of the seq2seq architecture. An RNN/LSTM that reads the

As used in Paper 11 — BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding →

A Transformer component that processes the full input sequence with no causal masking. Every token can attend to every other token. Contrast with the decoder, which masks future tokens. BERT uses only encoders.