Gradient Descent

Appears in 1 paper

The optimisation algorithm that trains neural networks.

As used in Paper 03 — Learning Representations by Back-propagating Errors →

The optimisation algorithm that trains neural networks. At each step: compute the gradient of the loss (using backpropagation), then update each weight by a small step in the direction opposite to the gradient. Repeat until the loss is small. The "descent" is moving downhill on the loss landscape.

As used in Paper 03 — Learning Representations by Back-propagating Errors →

A variant of gradient descent where, instead of computing the gradient on the full training set, you compute it on one example (stochastic) or a small batch of examples (mini-batch SGD) at each step. Much faster per step, and the noise in the gradient estimate can help escape local minima.