Catastrophic Forgetting
When a model loses knowledge from pretraining while being fine-tuned on new data.
When a model loses knowledge from pretraining while being fine-tuned on new data. In RLHF without KL penalty, the RL policy might optimize solely for reward, forgetting useful general knowledge. The KL penalty term prevents this by keeping the policy anchored to the SFT model.