Test-Time Compute

Appears in 2 papers

Spending additional computation at inference time (rather than training time) to improve performance.

As used in Paper 14 — Chain-of-Thought Prompting Elicits Reasoning in Large Language Models →

Spending additional computation at inference time (rather than training time) to improve performance. Chain-of-thought is a form of test-time compute — you're using more tokens (computation) at inference to generate reasoning, rather than training the model on reasoning data. Later work like OpenAI o1 pushed this much further, allocating massive test-time compute for complex reasoning.

As used in Paper 23 — Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Model Parameters →

Computation performed at inference time (when the model is answering a question), as opposed to training time. Generating multiple solutions, running search algorithms, or iterative refinement all happen at test-time. The paper shows that increasing TTC can be a substitute for increasing model size.

Paper 14 — Chain-of-Thought Prompting Elicits Reasoning in Large Language Models → Paper 23 — Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Model Parameters →

Appears in papers