WordPiece
BERT's subword tokenisation algorithm.
BERT's subword tokenisation algorithm. Splits rare or unknown words into smaller pieces that are in the vocabulary. Example: "unbelievable" → ["un", "##believable"]. The ## prefix signals a continuation piece. Vocabulary size: 30,522 for BERT-base.