Further Reading
Further Reading — Paper 03: Backpropagation 1986
The original paper Learning Representations by Back-propagating Errors — Rumelhart, Hinton, Williams (1986) Four pages in Nature. Remarkably concise and readable. The core algorithm is described in two pages of equations plus two pages of experiments. Worth reading the introduction — Rumelhart’s framing of the representation learning problem is still the clearest statement of what deep learning actually does. Difficulty: Advanced
Watch: Backpropagation, intuitively — 3Blue1Brown (YouTube) https://www.youtube.com/watch?v=Ilg3gGewQ5U Part 3 of 3Blue1Brown’s Neural Networks series. Brilliant visual explanation of how error flows backwards and what the chain rule looks like geometrically. Watch parts 1 and 2 first if you have not. Free, 14 minutes. Difficulty: Beginner–Intermediate
Build it yourself: Neural Networks from Scratch — Andrej Karpathy (YouTube) https://www.youtube.com/watch?v=VMj-3S1tku0 Karpathy builds a neural network library from scratch in Python in 2 hours — implementing backpropagation line by line. This is the best hands-on explanation of backpropagation that exists. He calls his library “micrograd.” The code is on GitHub, free to clone and run. Strongly recommended for anyone who wants to truly understand backprop. Difficulty: Intermediate
Geoffrey Hinton’s Nobel Prize lecture (2024) https://www.nobelprize.org/prizes/physics/2024/hinton/lecture/ The co-author of this paper speaking about 40 years of neural network research, from backpropagation to modern AI. Hinton is an exceptional communicator. This lecture covers the history of ideas, the key breakthroughs, and his concerns about where AI is heading. 45 minutes, free to watch. Difficulty: Beginner–Intermediate
Practice: Implement backprop for a 3-layer network No link — this is a coding challenge.
Try extending the code from Section 6 to three layers: input → hidden1 → hidden2 → output. You will need to:
- Add a W3 weight matrix for the second hidden layer
- Extend the forward pass by one more layer
- Extend the backward pass by one more delta computation
If you can do this without looking at a tutorial, you genuinely understand backpropagation. It is a worthwhile exercise. Difficulty: Intermediate