Gradient Descent
The optimisation algorithm that trains neural networks.
The optimisation algorithm that trains neural networks. At each step: compute the gradient of the loss (using backpropagation), then update each weight by a small step in the direction opposite to the gradient. Repeat until the loss is small. The "descent" is moving downhill on the loss landscape.
A variant of gradient descent where, instead of computing the gradient on the full training set, you compute it on one example (stochastic) or a small batch of examples (mini-batch SGD) at each step. Much faster per step, and the noise in the gradient estimate can help escape local minima.