Test-Time Compute
Spending additional computation at inference time (rather than training time) to improve performance.
Spending additional computation at inference time (rather than training time) to improve performance. Chain-of-thought is a form of test-time compute — you're using more tokens (computation) at inference to generate reasoning, rather than training the model on reasoning data. Later work like OpenAI o1 pushed this much further, allocating massive test-time compute for complex reasoning.
Computation performed at inference time (when the model is answering a question), as opposed to training time. Generating multiple solutions, running search algorithms, or iterative refinement all happen at test-time. The paper shows that increasing TTC can be a substitute for increasing model size.