Masked Language Modelling (MLM)

Appears in 1 paper

BERT's primary pre-training objective.

As used in Paper 11 — BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding →

BERT's primary pre-training objective. Randomly replaces 15% of tokens with [MASK] (or a random word, or the original) and trains the model to predict the original tokens using bidirectional context. Analogous to the Cloze test in educational psychology.