Step Size (Δ)

Appears in 1 paper

A scalar (or per-head scalar) that controls the discretisation rate.

As used in Paper 21 — Mamba: Linear-Time Sequence Modeling with Selective State Spaces →

A scalar (or per-head scalar) that controls the discretisation rate. Computed as Δ_t = softplus(Linear(u_t)). Larger Δ means the model "takes bigger steps" through time; smaller Δ means finer temporal resolution. Input-dependent in Mamba.