Autoregressive language model

Appears in 1 paper

A model that generates a sequence by predicting one token at a time, conditioning each prediction on all previously generated tokens.

As used in Paper 10 — Improving Language Understanding by Generative Pre-Training →

A model that generates a sequence by predicting one token at a time, conditioning each prediction on all previously generated tokens. P(sentence) = product of P(each token | previous tokens).