Sequence to Sequence Learning with Neural Networks
Seq2Seq — Sutskever, Vinyals, Le (2014)
TL;DR
Before 2014, machine translation relied on massive, hand-coded dictionaries and statistical alignment tables. Ilya Sutskever and his team at Google changed the game by snapping two Long Short-Term Memory (LSTM) networks together. The first network (the encoder) reads a sentence and compresses its entire meaning into a single list of numbers. The second network (the decoder) takes that list and unfolds it into a new language.
This architecture, known as Sequence-to-Sequence (seq2seq), proved that a single neural network could learn to translate purely by reading examples. It set the foundation for all modern language models.
The journey in one line
We taught networks to remember the past (Paper 04: LSTM). We taught them the meaning of individual words (Paper 05: Word2Vec). Now, we connect these ideas to build a machine that reads an entire thought in one language and writes it out in another.
What you will learn
- How the encoder-decoder architecture bridges two different languages.
- The concept of the context vector — squashing a whole sentence into one mathematical thought.
- Why researchers used the reverse-input trick to drastically improve translation.
- How teacher forcing acts like a music tutor to speed up training.
- The fundamental bottleneck of this architecture, which perfectly sets up the discovery of attention.
Sections
- Context: the end of hand-coded rules
- The problem: variable lengths and mixed-up words
- The idea: the courtroom translator
- How it works: encoders, decoders, and backwards input
- The math: a toy translation example
- The code: a toy seq2seq in PyTorch
- Impact: Google Translate to Bhashini
- Limitations: the single-vector bottleneck
- What came next: the road to attention
Resources
- Glossary — every new term used in this paper
- Quiz — 5 questions to test your understanding
- Further reading — blogs, videos, original paper
Discussion
Questions about this paper? Spotted something unclear? Start a discussion below — powered by GitHub, no separate account needed.