Pre-training

Appears in 3 papers

Training a model on large-scale, typically unlabelled data before fine-tuning.

As used in Paper 10 — Improving Language Understanding by Generative Pre-Training →

Training a model on large-scale, typically unlabelled data before fine-tuning. GPT-1's pre-training objective: next-token prediction on BooksCorpus.

As used in Paper 12 — Language Models are Few-Shot Learners →

Training a language model on unlabeled, diverse data from the internet (Common Crawl, books, Wikipedia). This teaches the model general language patterns. It requires massive compute but is done once. All downstream tasks benefit from this knowledge.

As used in Paper 13 — Scaling Laws for Neural Language Models →

The initial training phase where a language model learns from large amounts of unlabeled data. After pre-training, the model can be fine-tuned on specific tasks or used zero-shot.