Catastrophic Forgetting

Appears in 1 paper

When a model loses knowledge from pretraining while being fine-tuned on new data.

As used in Paper 15 — Training Language Models to Follow Instructions with Human Feedback →

When a model loses knowledge from pretraining while being fine-tuned on new data. In RLHF without KL penalty, the RL policy might optimize solely for reward, forgetting useful general knowledge. The KL penalty term prevents this by keeping the policy anchored to the SFT model.

Paper 15 — Training Language Models to Follow Instructions with Human Feedback →

Appears in papers