Further reading — Seq2Seq (2014)
Further reading — Paper 06
Blogs, videos, code, and Indian-language resources. Start at the top and work down.
The original papers
-
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to Sequence Learning with Neural Networks. NeurIPS. https://arxiv.org/abs/1409.3215 The paper we just read. Remarkably readable compared to modern papers — pay special attention to their explanation of the reverse-input trick in the introduction.
-
Cho, K. et al. (2014). Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. EMNLP. https://arxiv.org/abs/1406.1078 Published a few months before Sutskever’s paper, introduced the encoder-decoder idea using GRUs instead of LSTMs. Complementary reading.
Blog posts
-
Jay Alammar — “Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)”. https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/ Gorgeous animated walkthrough. The first half covers pure seq2seq; the second half previews Paper 07 (attention).
-
Chris Olah & Shan Carter — “Attention and Augmented Recurrent Neural Networks” (Distill, 2016). https://distill.pub/2016/augmented-rnns/ Classic visual explanation of RNN-based translation and attention.
-
Lilian Weng — “Attention? Attention!” (2018). https://lilianweng.github.io/posts/2018-06-24-attention/ Starts with seq2seq, then builds up to Transformer-era attention. Excellent bridge between Papers 06 → 07 → 08.
Videos
-
Andrew Ng — “Sequence to Sequence Models” (DeepLearning.AI Coursera). https://www.coursera.org/lecture/nlp-sequence-models/basic-models-HAPhR Masterclass explanation with extreme clarity. Free to audit.
-
Stanford CS224N — “Machine Translation, Seq2Seq and Attention”. https://www.youtube.com/watch?v=XXtpJxZBa2c Chris Manning’s lecture covering the same material at grad-student depth.
-
Andrej Karpathy — “The Unreasonable Effectiveness of Recurrent Neural Networks”. https://karpathy.github.io/2015/05/21/rnn-effectiveness/ Not about seq2seq specifically, but the best intuition-builder for why RNNs generate coherent text at all.
Code and tutorials
-
Official PyTorch seq2seq tutorial — “NLP From Scratch: Translation with a Sequence to Sequence Network and Attention”. https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html Walks through building a French-to-English translator from scratch, starting with pure seq2seq and then adding attention. Directly expands on the toy code in Section 6.
-
TensorFlow NMT tutorial. https://www.tensorflow.org/text/tutorials/nmt_with_attention The TF equivalent — English-to-Spanish this time.
-
OpenNMT. https://opennmt.net/ An open-source toolkit for NMT that started in the seq2seq era and still supports the architecture.
Indian-language projects you can try
-
AI4Bharat — IndicTrans2. https://github.com/AI4Bharat/IndicTrans2 Translation models across 22 scheduled Indian languages, open-source and pretrained. Successor of this paper’s architecture.
-
AI4Bharat — IndicBART. https://github.com/AI4Bharat/indic-bart An encoder-decoder Transformer pre-trained on Indian languages — modern descendant of seq2seq.
-
Bhashini (National Language Translation Mission). https://bhashini.gov.in/ India’s government-backed translation platform. Read about the Digital India Bhashini Division (DIBD) and the underlying tech.
-
Samanantar corpus (AI4Bharat). https://ai4bharat.iitm.ac.in/samanantar 49 million parallel sentence pairs across 11 Indian languages. Excellent training dataset if you want to train your own seq2seq model from scratch.
-
iNLTK (Indic NLP Toolkit). https://inltk.readthedocs.io/ Simple Python API including translation utilities for Hindi, Tamil, Bengali, and more.
Academic resources in India
-
IIT Madras — AI4Bharat. https://ai4bharat.org/ Home of the state-of-the-art open-source Indian-language NLP work.
-
IIT Bombay — CFILT (Center for Indian Language Technology). http://www.cfilt.iitb.ac.in/ Long history of Indian-language MT, including seq2seq-era work and the Hindi WordNet.
-
IIIT Hyderabad — LTRC (Language Technologies Research Centre). http://ltrc.iiit.ac.in/ Parsing and translation research for Indian languages.
Reading order to understand modern NLP
If you’re continuing through this series:
- ✅ Paper 06 (Seq2Seq) — you just finished this.
- Paper 07 (Bahdanau Attention) — patch seq2seq’s bottleneck.
- Paper 08 (Transformer) — throw out LSTMs, keep only attention.
- Paper 10 (GPT-1) and Paper 11 (BERT) — the two faces of modern pretraining.
Back to Paper 06 home · Glossary · Quiz.