Human Preference / Human Feedback
Judgments by human raters about which model outputs are better.
Judgments by human raters about which model outputs are better. In this paper, collected as pairwise comparisons (output A vs. output B → which is better?). Used to train the reward model and validate alignment quality. Typically 70–75% inter-rater agreement.