Training-Time Compute

Appears in 1 paper

Computation used to train the model initially.

As used in Paper 23 — Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Model Parameters →

Computation used to train the model initially. Larger models require more training compute. Training-time scale has historically been the primary way to improve model capability, but test-time compute offers a complementary path.