Limitations and Criticism

The XOR problem — what broke the Perceptron

In 1969, Marvin Minsky and Seymour Papert — two of the most respected AI researchers in the world — published a book called Perceptrons. It was a mathematical analysis of what single-layer Perceptrons could and could not do.

Their central finding was devastating: a single Perceptron cannot learn the XOR function.

What is XOR? It stands for “exclusive or.” The output is 1 if exactly one input is 1 — but not both.

Input 1	Input 2	XOR Output
0	0	0
0	1	1
1	0	1
1	1	0

Try running the code from Section 6 with these labels. The Perceptron never converges. It keeps making mistakes forever.

Why? Because XOR is not linearly separable. There is no single straight line you can draw on a 2D graph that separates the 1s (top-right and bottom-left) from the 0s (top-left and bottom-right). You need at least two lines — or a curved boundary — to separate them correctly.

And the Perceptron can only draw one straight line. That is its fundamental limit.

Why this mattered so much

XOR is not a toy problem. It represents an entire class of patterns that are not linearly separable — patterns where the relationship between inputs and outputs cannot be captured by a simple weighted sum.

Most real-world problems are not linearly separable. The difference between a cat and a dog in a photograph is not describable by a single linear threshold on pixel values. Language understanding is not linearly separable. Medical diagnosis is not linearly separable.

Minsky and Papert’s book did not just show that the Perceptron could not learn XOR. It showed that any single-layer Perceptron has deep theoretical limitations on the kinds of functions it can represent.

Their conclusion — which they expressed somewhat too forcefully — was that neural networks in general were unlikely to scale to interesting problems.

The first AI winter

The publication of Perceptrons in 1969 helped trigger what is now called the first AI winter: a dramatic reduction in AI funding and research throughout the 1970s.

The US Department of Defense, which had been generously funding AI research, pulled back. Research groups shrank. Promising directions were abandoned. Rosenblatt himself died in 1971 — whether he knew the winter was coming is unclear.

“AI winter” has become a generic term in the field. There have been two major ones:

First AI winter (1969–1980): triggered partly by Perceptrons and the failure of early AI to deliver on its promises
Second AI winter (1987–1993): triggered by the failure of “expert systems” — rule-based AI that briefly dominated — to scale

Both winters ended when a new idea arrived that genuinely worked better than what came before.

What Minsky and Papert got wrong

To be fair to history: Minsky and Papert were right that single-layer Perceptrons were limited. They were wrong — or at least premature — in their implication that multi-layer networks would not help.

In the same year their book was published, other researchers were already thinking about networks with hidden layers — intermediate neurons between the input and the output. A two-layer network with a hidden layer can learn XOR. It can, in fact, learn any boolean function. It just needs more than one layer.

The problem in 1969 was not that multi-layer networks were impossible. The problem was that no one knew how to train them. The Perceptron learning rule only worked for the output layer. There was no algorithm for adjusting the weights of hidden layers.

That algorithm — backpropagation — was discovered (and rediscovered, and popularised) over the next two decades. It is the subject of the next paper.

What the Perceptron always was — and was always not

The Perceptron was never a model of general intelligence. It was a model of a single decision — a binary classifier. It said yes or no.

The enthusiasm of 1958 was, in retrospect, excessive. The New York Times and New Yorker articles suggested the Perceptron might soon walk, talk, and think. It could not. It could learn to distinguish two types of images, given simple inputs. That was impressive. It was not intelligence.

The mistake — which is repeated every decade with every new AI breakthrough — is confusing a genuine capability advance with general intelligence. The Perceptron could learn. It could not reason. It could not generalise beyond its training distribution. It had no understanding of what it was classifying.

These limitations are still present, in subtler forms, in the most powerful AI systems today. Every paper in the rest of this timeline is, in some sense, an attempt to address one of them.

Next: What Came Next →