Outcome Reward Model (ORM)

Appears in 2 papers

A model that scores only the final output (right or wrong), without evaluating intermediate steps.

A model that scores only the final output (right or wrong), without evaluating intermediate steps. Less informative than a PRM, but simpler to train.

A model that scores only the final output (right or wrong), not intermediate steps. Less informative than PRM but simpler to train.

Appears in papers