Human Preference / Human Feedback

Appears in 1 paper

Judgments by human raters about which model outputs are better.

As used in Paper 15 — Training Language Models to Follow Instructions with Human Feedback →

Judgments by human raters about which model outputs are better. In this paper, collected as pairwise comparisons (output A vs. output B → which is better?). Used to train the reward model and validate alignment quality. Typically 70–75% inter-rater agreement.