Best-of-N (BoN)

Appears in 1 paper

A strategy where you generate N independent solutions to the same problem and select the best one according to some criterion (e.g., a Process Reward Model score).

As used in Paper 23 — Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Model Parameters →

A strategy where you generate N independent solutions to the same problem and select the best one according to some criterion (e.g., a Process Reward Model score). The probability of at least one correct solution is 1 - (1-p)^N, where p is the base success rate.