The Mathematics

This is the first paper on Ainiketan where real mathematics appears. Do not be intimidated. Every symbol is explained below, and there is a worked example with actual numbers so you can verify everything with pen and paper.

Mathematical concepts used in this paper

Concept: Vectors Why needed: The inputs to a Perceptron (pixel values, sensor readings, features) are a list of numbers — which is exactly what a vector is. Thinking of inputs as vectors lets us use compact notation and reason about them geometrically. Where in paper: Every input to the Perceptron is a vector x = [x₁, x₂, …, xₙ] Tutorial: Vectors — Introduction

Concept: Dot Product Why needed: The weighted sum — the core computation of the Perceptron — is exactly the dot product of the weight vector and the input vector. Where in paper: The forward pass: output = sign(w · x − θ) Tutorial: Dot Product

Concept: Probability Basics Why needed: Rosenblatt framed the Perceptron as a probabilistic model. The word “probabilistic” is literally in the paper’s title. He thought of the weights as encoding probabilities of connection strengths in a biological network. Where in paper: Throughout the theoretical framing of the paper Tutorial: Probability Basics

The key equation: the Perceptron’s output

The Perceptron’s prediction is:

ŷ = 1   if  (w₁x₁ + w₂x₂ + ... + wₙxₙ) ≥ θ
ŷ = 0   if  (w₁x₁ + w₂x₂ + ... + wₙxₙ) < θ

Where:

x₁, x₂, …, xₙ = the input features (numbers describing the example)
w₁, w₂, …, wₙ = the weights (how important each feature is; these are learned)
θ (theta) = the threshold (the minimum weighted sum needed to output 1)
ŷ (y-hat) = the Perceptron’s prediction (0 or 1)

In compact vector notation:

ŷ = 1   if   w · x ≥ θ
ŷ = 0   if   w · x < θ

Where w · x means the dot product of vectors w and x.

The key equation: the learning rule

When the Perceptron makes a mistake, it updates each weight:

wᵢ ← wᵢ + η × (y − ŷ) × xᵢ

Where:

wᵢ = the weight being updated (weight for input i)
η (eta) = the learning rate (a small positive number, e.g. 0.1)
y = the correct answer (0 or 1, provided in the training data)
ŷ = the Perceptron’s prediction (0 or 1)
xᵢ = the value of input i for this example
(y − ŷ) = the error: +1 if we predicted 0 but answer was 1; −1 if we predicted 1 but answer was 0; 0 if correct

Notice what happens in each case:

If correct (y = ŷ): error = 0, so wᵢ ← wᵢ + 0 = wᵢ. No change. ✓
If false negative (y = 1, ŷ = 0): error = +1, so wᵢ increases. The Perceptron will be more likely to say 1 next time. ✓
If false positive (y = 0, ŷ = 1): error = −1, so wᵢ decreases. The Perceptron will be less likely to say 1 next time. ✓

Worked numerical example — full step by step

We train a Perceptron to learn the OR gate: output 1 if either input is 1.

Training data:

x₁	x₂	Correct y
0	0	0
0	1	1
1	0	1
1	1	1

Initial values: w₁ = 0, w₂ = 0, θ = 0.5, η = 0.1

Epoch 1, Example 1: x = [0, 0], y = 0

Weighted sum = (0 × 0) + (0 × 0) = 0
0 < 0.5 → ŷ = 0
Error = y − ŷ = 0 − 0 = 0  → No update

Weights unchanged: w₁ = 0, w₂ = 0

Epoch 1, Example 2: x = [0, 1], y = 1

Weighted sum = (0 × 0) + (0 × 1) = 0
0 < 0.5 → ŷ = 0
Error = 1 − 0 = +1  → UPDATE
w₁ ← 0 + 0.1 × 1 × 0 = 0      (x₁ = 0, so w₁ doesn't change)
w₂ ← 0 + 0.1 × 1 × 1 = 0.1    (x₂ = 1, so w₂ increases)

Weights: w₁ = 0, w₂ = 0.1

Epoch 1, Example 3: x = [1, 0], y = 1

Weighted sum = (0 × 1) + (0.1 × 0) = 0
0 < 0.5 → ŷ = 0
Error = 1 − 0 = +1  → UPDATE
w₁ ← 0 + 0.1 × 1 × 1 = 0.1    (x₁ = 1, so w₁ increases)
w₂ ← 0.1 + 0.1 × 1 × 0 = 0.1  (x₂ = 0, so w₂ unchanged)

Weights: w₁ = 0.1, w₂ = 0.1

Epoch 1, Example 4: x = [1, 1], y = 1

Weighted sum = (0.1 × 1) + (0.1 × 1) = 0.2
0.2 < 0.5 → ŷ = 0
Error = 1 − 0 = +1  → UPDATE
w₁ ← 0.1 + 0.1 × 1 × 1 = 0.2
w₂ ← 0.1 + 0.1 × 1 × 1 = 0.2

Weights: w₁ = 0.2, w₂ = 0.2

After Epoch 1, we still have errors. But the weights have grown from 0 to 0.2. After several more epochs, the weights will reach values like w₁ = 0.6, w₂ = 0.6, at which point:

(0,0) → sum = 0 < 0.5 → output 0 ✓
(0,1) → sum = 0.6 ≥ 0.5 → output 1 ✓
(1,0) → sum = 0.6 ≥ 0.5 → output 1 ✓
(1,1) → sum = 1.2 ≥ 0.5 → output 1 ✓

The OR gate is learned. Try verifying these by hand on paper.

What the Perceptron Convergence Theorem says

Rosenblatt proved: if the training data is linearly separable, the Perceptron learning rule will always find a set of weights that correctly classifies all training examples, in a finite number of steps.

“Linearly separable” means you can draw a straight line (or, in higher dimensions, a flat plane called a hyperplane) that separates the two classes perfectly.

The AND and OR gates are linearly separable. The XOR gate is not — and that is what broke the Perceptron. We discuss this in Limitations →.

Next: The Code →