Historical Context

The wreckage of the first AI winter

By 1986, artificial intelligence had been through its first major crisis and was cautiously climbing back.

The crisis had begun in 1969 when Minsky and Papert published Perceptrons. Their mathematical critique was devastating: single-layer perceptrons had fundamental limitations. They could not learn XOR, could not represent complex patterns, could not generalise beyond simple linear boundaries.

The obvious fix — add more layers — was stymied by a problem no one could solve: how do you train the hidden layers? The Perceptron learning rule worked only at the output layer, where you could directly measure the error. Hidden layer neurons produce internal representations that are not directly compared to any label. Their “mistakes” cannot be observed directly. Adjusting their weights seemed to require information that simply was not available.

This became known as the credit assignment problem — and it stopped the field in its tracks.

Government funding dried up. The US DARPA agency, which had poured money into AI throughout the 1960s, issued a damning report in 1973 (the “Lighthill Report” in the UK was similar) concluding that AI had failed to deliver on its promises. Research groups shrank. Promising young researchers were advised to work on something else.

The 1970s were a decade of stagnation for neural networks.

The symbolic AI interlude

While neural networks stagnated, a different approach briefly flourished: expert systems.

The logic was appealing: instead of trying to learn knowledge from data, why not just ask human experts to write down everything they know, encode it as rules, and let a computer execute those rules?

Companies built expert systems for diagnosing diseases (MYCIN at Stanford), configuring computer systems (XCON at DEC), and analysing geological data for oil exploration. In the early 1980s, these systems seemed like the future of AI. XCON alone was reportedly saving DEC $40 million per year.

Then the limits became apparent. Expert systems were brittle — they worked only within the narrow domain they had been programmed for. Adding new knowledge meant interviewing experts again and rewriting rules by hand. They could not learn. They could not generalise. Maintaining them as the world changed was prohibitively expensive.

By the late 1980s, the expert systems industry was in collapse — triggering the second AI winter of 1987–1993.

But crucially, while expert systems rose and fell, a small group of researchers kept working on neural networks.

The connectionist revival

The early 1980s saw a quiet revival of interest in connectionist models — neural networks — led by a handful of persistent researchers.

John Hopfield (1982) published a paper on a new kind of recurrent neural network that could store and retrieve memories. Hopfield Networks captured widespread attention and brought physics-trained scientists into AI for the first time.

The PDP Research Group at the University of California San Diego — including David Rumelhart, James McClelland, and Geoffrey Hinton — was systematically developing a theoretical framework for connectionist models. Their two-volume book Parallel Distributed Processing (1986) presented neural networks as a serious scientific framework for understanding cognition.

It was within this group that backpropagation was developed, refined, and finally published.

The three authors

David Rumelhart was a cognitive psychologist and computer scientist — the intellectual leader of the PDP group. He was primarily interested in models of human cognition, not AI engineering. His questions were: how does the brain learn? How do internal representations form? Backpropagation was, for him, a theory of biological learning as much as an engineering algorithm.

Geoffrey Hinton was a British computer scientist who would go on to become perhaps the most important figure in the deep learning revolution. He co-founded the field of deep learning, trained some of its most important early models, and spent decades at the University of Toronto and Google Brain before leaving in 2023 to speak openly about AI risks. He shared the 2024 Nobel Prize in Physics for his contributions to neural networks.

Ronald Williams was a graduate student who contributed crucial mathematical details to the backpropagation derivation.

The paper was published in Nature — not a computer science venue, but one of the most prestigious scientific journals in the world. Publishing there was a statement: this is not just engineering. This is science.

Next: The Problem →