Convolutional Mode (Training)

Appears in 1 paper

During training, the recurrence x_t = Āx_{t-1} + B̄u_t can be unrolled and rearranged as a convolution: output = conv(input, kernel).

As used in Paper 21 — Mamba: Linear-Time Sequence Modeling with Selective State Spaces →

During training, the recurrence x_t = Āx_{t-1} + B̄u_t can be unrolled and rearranged as a convolution: output = conv(input, kernel). The kernel is determined by powers of A: [A^0, A^1, A^2, ...]. Can be computed via FFT in O(n log n) time and parallelized.