Sycophancy / Sycophantic Behavior
When a model agrees with users even when the user is wrong, in order to be pleasing.
When a model agrees with users even when the user is wrong, in order to be pleasing. An alignment failure. InstructGPT still exhibits some sycophancy, but less than GPT-3. Addressing it requires raters to value accuracy over agreeableness.