Paper 04

Further reading — LSTM (1997)

Further reading — Paper 04

If this paper hooked you, here is a curated reading list. All of it is free. Start with the blog posts, watch the videos, then try the original paper at the end — by the time you get there, everything will read smoothly.

The original paper

  • Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. PDF: https://www.bioinf.jku.at/publications/older/2604.pdf The paper is dense and uses notation heavier than necessary, but the core equations in Section 4 of the paper are exactly what we covered.

  • Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, TU München. The original vanishing-gradient analysis, in German. Hard to find a clean English translation, but the paper above summarises its results.

Videos

  • 3Blue1Brown — “Neural Networks” series, parts 1–4. https://www.3blue1brown.com/topics/neural-networks Not specifically about LSTMs, but covers gradients and backpropagation beautifully. Watch this before attempting the paper.

  • StatQuest with Josh Starmer — “Long Short-Term Memory (LSTM)”. https://www.youtube.com/watch?v=YCzL96nL7j0 Goofy but clear. Walks through an LSTM step by step with exactly the kind of worked example we built in Section 5.

  • Andrej Karpathy — “Let’s build GPT: from scratch, in code, spelled out”. https://www.youtube.com/watch?v=kCc8FmEb1nY Not about LSTMs, but the first 30 minutes explain character-level sequence modelling and the bigram model, which is excellent context for why sequences matter.

Code and tutorials

Papers to read before GPT-era stuff

If your goal is to reach Transformers (Paper 08) with clear intuition, read in this order:

  1. This paper (LSTM) — ✅ done.
  2. Word2Vec (Paper 05) — word embeddings as input to sequence models.
  3. Seq2Seq (Paper 06) — encoder-decoder LSTMs for translation.
  4. Bahdanau Attention (Paper 07) — the first attention mechanism, invented to patch the LSTM bottleneck.
  5. Attention Is All You Need (Paper 08) — the transformer.

Indian resources and community

  • AI4Bharat (IIT Madras). https://ai4bharat.org Indian-language NLP research lab. Many of their early translation models for Hindi, Tamil, Bengali, and other Indian languages used LSTM-based sequence-to-sequence architectures.

  • IISc Bangalore’s NPTEL course on Deep Learning. https://nptel.ac.in/courses/106106184 Free video lectures in English, taught by Prof. Mitesh Khapra. Covers LSTMs in Week 6.

  • IIT Madras — Deep Learning for Computer Vision (Prof. Vineeth Balasubramanian). https://nptel.ac.in/courses/106106224 Uses LSTMs in the video/sequence modelling sections.


Back to Paper 04 home · Glossary · Quiz.