Autoregressive language model
A model that generates a sequence by predicting one token at a time, conditioning each prediction on all previously generated tokens.
A model that generates a sequence by predicting one token at a time, conditioning each prediction on all previously generated tokens. P(sentence) = product of P(each token | previous tokens).