Next-token prediction
The pre-training objective: given all previous tokens, predict the probability distribution over the next token.
The pre-training objective: given all previous tokens, predict the probability distribution over the next token. Equivalent to maximising the log-likelihood of the training text.