Eigenvalues and Eigenvectors
Eigenvalues and Eigenvectors
1. What is this and why do we care?
Eigenvalues and eigenvectors describe how matrices transform space. For most vectors, a matrix changes both the direction and magnitude. But for special vectors — eigenvectors — the matrix only stretches or shrinks them by a scalar factor: the eigenvalue. This sounds abstract, but every stable recurrent system — from Mamba’s state space model to classical control theory — relies on eigenvalues to determine how information decays or grows over time. In language models, eigenvalues tell you whether information in a sequence is remembered or forgotten.
2. The geometric picture
Imagine a matrix A as a transformation machine. You put in a vector v, and it spits out a new vector Av.
For most vectors, both the direction and magnitude change:
v = [1, 0] (pointing right)
A·v = [3, 1] (pointing upper-right, stretched)
Direction changed. Magnitude grew.
But for special vectors — eigenvectors — the direction stays the same. The matrix only multiplies by a scalar:
v = [1, 1] (pointing northeast)
A·v = [2, 2] = 2 · [1, 1] = 2 · v
Direction unchanged. Only scaled by 2.
That scalar is the eigenvalue: λ = 2.
Indian analogy
A spinning top has a special axis — the spin axis. Push the top at an angle and it wobbles (direction changes, precesses). Push it along the spin axis and it just spins faster or slower (same direction, scaled). The spin axis is the eigenvector; how much faster it spins is the eigenvalue.
For a recurrent sequence model: imagine information flowing downstream. Most information muddles (direction changes). But some information — the crucial facts — follow a clean channel (direction unchanged, just diluted by decay). That channel is an eigenvector; the dilution rate is the eigenvalue.
3. The formal definition
Definition: For an n×n matrix A, a non-zero vector v and scalar λ are an eigenvector-eigenvalue pair if:
$$\mathbf{A} \mathbf{v} = \lambda \mathbf{v}$$
- v is the eigenvector (direction that doesn’t change)
- λ is the eigenvalue (scaling factor)
- A and v are compatible: A is n×n, v is n×1
Example:
A = [[3, 1], [0, 2]]
v = [1, 0]ᵀ
A·v = [3, 0]ᵀ = 3·[1, 0]ᵀ = 3·v
So (v, λ = 3) is an eigenvector-eigenvalue pair.
4. How to find eigenvalues: the characteristic equation
To find eigenvalues, we rearrange the definition:
$$\mathbf{A} \mathbf{v} = \lambda \mathbf{v}$$
$$\mathbf{A} \mathbf{v} - \lambda \mathbf{v} = \mathbf{0}$$
$$(\mathbf{A} - \lambda \mathbf{I}) \mathbf{v} = \mathbf{0}$$
where I is the identity matrix.
For a non-trivial solution (v ≠ 0) to exist, the matrix (A - λI) must be singular (non-invertible), which means:
$$\det(\mathbf{A} - \lambda \mathbf{I}) = 0$$
This is the characteristic equation. Solving it gives all eigenvalues.
5. Worked example: 2×2 matrix
Let’s find eigenvalues and eigenvectors of:
$$\mathbf{A} = \begin{bmatrix} 3 & 1 \ 0 & 2 \end{bmatrix}$$
Step 1: Form A - λI
$$\mathbf{A} - \lambda \mathbf{I} = \begin{bmatrix} 3 - \lambda & 1 \ 0 & 2 - \lambda \end{bmatrix}$$
Step 2: Compute the determinant
$$\det(\mathbf{A} - \lambda \mathbf{I}) = (3 - \lambda)(2 - \lambda) - 0 \cdot 1 = (3 - \lambda)(2 - \lambda)$$
Step 3: Solve det = 0
$$(3 - \lambda)(2 - \lambda) = 0$$
$$\lambda_1 = 3, \quad \lambda_2 = 2$$
These are the eigenvalues.
Step 4: Find eigenvector for λ₁ = 3
Solve (A - 3I)v = 0:
$$\begin{bmatrix} 0 & 1 \ 0 & -1 \end{bmatrix} \begin{bmatrix} v_1 \ v_2 \end{bmatrix} = \begin{bmatrix} 0 \ 0 \end{bmatrix}$$
From row 1: $0 \cdot v_1 + 1 \cdot v_2 = 0$ → $v_2 = 0$
$v_1$ is free. Choose $v_1 = 1$.
$$\mathbf{v}_1 = \begin{bmatrix} 1 \ 0 \end{bmatrix}$$
Verify: $\mathbf{A} \mathbf{v}_1 = \begin{bmatrix} 3 & 1 \ 0 & 2 \end{bmatrix} \begin{bmatrix} 1 \ 0 \end{bmatrix} = \begin{bmatrix} 3 \ 0 \end{bmatrix} = 3 \begin{bmatrix} 1 \ 0 \end{bmatrix}$ ✓
Step 5: Find eigenvector for λ₂ = 2
Solve (A - 2I)v = 0:
$$\begin{bmatrix} 1 & 1 \ 0 & 0 \end{bmatrix} \begin{bmatrix} v_1 \ v_2 \end{bmatrix} = \begin{bmatrix} 0 \ 0 \end{bmatrix}$$
From row 1: $v_1 + v_2 = 0$ → $v_1 = -v_2$
Choose $v_2 = 1$, then $v_1 = -1$.
$$\mathbf{v}_2 = \begin{bmatrix} -1 \ 1 \end{bmatrix}$$
Verify: $\mathbf{A} \mathbf{v}_2 = \begin{bmatrix} 3 & 1 \ 0 & 2 \end{bmatrix} \begin{bmatrix} -1 \ 1 \end{bmatrix} = \begin{bmatrix} -3 + 1 \ 2 \end{bmatrix} = \begin{bmatrix} -2 \ 2 \end{bmatrix} = 2 \begin{bmatrix} -1 \ 1 \end{bmatrix}$ ✓
6. What eigenvalues tell you about stability
In recurrent systems, we repeatedly apply a matrix:
$$\mathbf{x}_{t+1} = \mathbf{A} \mathbf{x}_t$$
After t steps: $\mathbf{x}_t = \mathbf{A}^t \mathbf{x}_0$.
If x_0 is aligned with an eigenvector v:
$$\mathbf{x}_t = \mathbf{A}^t (\lambda_1 \mathbf{v}) = \lambda_1^t \mathbf{A}^t \mathbf{v} = \lambda^t \mathbf{v}$$
The scaling is controlled entirely by λ^t:
| Eigenvalue Range | Behavior | Meaning |
|---|---|---|
| |λ| < 1 | λ^t → 0 as t → ∞ | System is stable; information decays (forgets) |
| |λ| = 1 | λ^t stays constant | Marginal stability; information persists indefinitely |
| |λ| > 1 | λ^t → ∞ as t → ∞ | System is unstable; information grows explosively |
For Mamba and state space models
Mamba’s state transition matrix A is designed so that (after discretisation) the eigenvalues of Ā are between 0 and 1. This ensures the system is stable and information decays naturally over time.
The key insight: Mamba learns to set eigenvalues appropriately for each task. For important information, it uses eigenvalues close to 1 (slow decay). For noise, it uses eigenvalues close to 0 (fast decay).
7. Numerical example: memory length
Consider a simple scalar recurrence (1×1 matrix):
$$x_{t+1} = 0.9 \cdot x_t$$
This is a matrix A = [[0.9]], so the eigenvalue is λ = 0.9.
Starting with x_0 = 1:
| Time t | x_t | Remaining (%) |
|---|---|---|
| t=0 | 1.000 | 100 |
| t=1 | 0.900 | 90 |
| t=5 | 0.590 | 59 |
| t=10 | 0.349 | 35 |
| t=20 | 0.122 | 12 |
| t=50 | 0.005 | 0.5 |
After 50 time steps with λ = 0.9, less than 0.5% of the original information remains.
Now compare with λ = 0.99:
| Time t | x_t | Remaining (%) |
|---|---|---|
| t=0 | 1.000 | 100 |
| t=50 | 0.605 | 61 |
| t=100 | 0.366 | 37 |
With λ = 0.99, information persists much longer. This difference is critical in language models: larger eigenvalues (closer to 1) = longer memory = model can track long-range dependencies.
Mamba’s selective mechanism dynamically adjusts eigenvalues (via Δ and A) to achieve the right memory length for each input.
8. Summary table: eigenvalues in recurrent systems
| Property | Value | Interpretation |
|---|---|---|
| Eigenvalue magnitude | |λ| < 1 | System is stable |
| Large |λ| (close to 1) | e.g., 0.99 | Slow decay → long memory |
| Small |λ| (close to 0) | e.g., 0.1 | Fast decay → short memory |
| Memory length | ~ 1 / (1 - |λ|) | Rough half-life of information |
| Mamba’s trick | Learn Δ to adjust eigenvalues dynamically | Forget noise quickly, remember facts slowly |
9. Why this matters for deep learning
In Mamba and other recurrent models, the recurrence is:
$$\mathbf{x}t = \overline{\mathbf{A}} \mathbf{x}{t-1} + \overline{\mathbf{B}} u_t$$
where $\overline{\mathbf{A}} = \exp(\Delta \mathbf{A})$ (after discretisation).
The eigenvalues of $\overline{\mathbf{A}}$ determine:
- Stability: Do repeated applications blow up or decay?
- Memory: How long does the system remember information?
- Selectivity: Mamba learns to set eigenvalues per input so important tokens are remembered longer.
If a Mamba model struggles to recall facts from long ago, the eigenvalues are probably set to decay too fast. If it gets confused by noise, eigenvalues might be decaying too slowly.
Self-Check Questions
Q1: For the matrix A = [[2, 0], [0, 3]], what are the eigenvalues?
Q2: If λ = 0.5, after how many steps has the information decayed to 1%?
Q3: In Mamba, why would you want different eigenvalues for different state dimensions?
Answers
Q1: Since A is diagonal, eigenvalues are the diagonal entries: λ₁ = 2, λ₂ = 3.
Q2: We want 0.5^t = 0.01. Taking logarithms: t·log(0.5) = log(0.01) → t = log(0.01)/log(0.5) ≈ 6.64. So after 7 steps, less than 1% remains.
Q3: Different state dimensions may represent information at different timescales. Some dimensions track short-term patterns (fast decay, small |λ|); others track long-term dependencies (slow decay, large |λ|). Mamba’s selective mechanism learns this trade-off per input.
Where This Shows Up Next
- Paper 21 (Mamba): Eigenvalues of the state matrix A control memory length. Mamba learns to set them dynamically.
- Control Theory: Eigenvalues determine system stability (taught in every engineering curriculum).
- Deep Learning: Eigenvalues of weight matrices affect training dynamics and gradient flow (vanishing/exploding gradients).
- Graph Neural Networks: Eigenvalues of adjacency matrices determine information spread through graphs.
Back to Linear Algebra tutorials · Matrix Multiplication · Vectors