Further Reading — Paper 03: Backpropagation 1986

The original paper Learning Representations by Back-propagating Errors — Rumelhart, Hinton, Williams (1986) Four pages in Nature. Remarkably concise and readable. The core algorithm is described in two pages of equations plus two pages of experiments. Worth reading the introduction — Rumelhart’s framing of the representation learning problem is still the clearest statement of what deep learning actually does. Difficulty: Advanced

Watch: Backpropagation, intuitively — 3Blue1Brown (YouTube) https://www.youtube.com/watch?v=Ilg3gGewQ5U Part 3 of 3Blue1Brown’s Neural Networks series. Brilliant visual explanation of how error flows backwards and what the chain rule looks like geometrically. Watch parts 1 and 2 first if you have not. Free, 14 minutes. Difficulty: Beginner–Intermediate

Build it yourself: Neural Networks from Scratch — Andrej Karpathy (YouTube) https://www.youtube.com/watch?v=VMj-3S1tku0 Karpathy builds a neural network library from scratch in Python in 2 hours — implementing backpropagation line by line. This is the best hands-on explanation of backpropagation that exists. He calls his library “micrograd.” The code is on GitHub, free to clone and run. Strongly recommended for anyone who wants to truly understand backprop. Difficulty: Intermediate

Geoffrey Hinton’s Nobel Prize lecture (2024) https://www.nobelprize.org/prizes/physics/2024/hinton/lecture/ The co-author of this paper speaking about 40 years of neural network research, from backpropagation to modern AI. Hinton is an exceptional communicator. This lecture covers the history of ideas, the key breakthroughs, and his concerns about where AI is heading. 45 minutes, free to watch. Difficulty: Beginner–Intermediate

Practice: Implement backprop for a 3-layer network No link — this is a coding challenge.

Try extending the code from Section 6 to three layers: input → hidden1 → hidden2 → output. You will need to:

Add a W3 weight matrix for the second hidden layer
Extend the forward pass by one more layer
Extend the backward pass by one more delta computation

If you can do this without looking at a tutorial, you genuinely understand backpropagation. It is a worthwhile exercise. Difficulty: Intermediate