Human Feedback (HF)

Appears in 1 paper

Labels provided by humans comparing two AI outputs and indicating which one is better.

As used in Paper 22 — Constitutional AI: Harmlessness from AI Feedback →

Labels provided by humans comparing two AI outputs and indicating which one is better. In RLHF (Paper 15), human feedback is used to train reward models. Constitutional AI aims to replace this with AI feedback.

Paper 22 — Constitutional AI: Harmlessness from AI Feedback →

Appears in papers