Step Size (Δ)
A scalar (or per-head scalar) that controls the discretisation rate.
A scalar (or per-head scalar) that controls the discretisation rate. Computed as Δ_t = softplus(Linear(u_t)). Larger Δ means the model "takes bigger steps" through time; smaller Δ means finer temporal resolution. Input-dependent in Mamba.