Accuracy / Performance Metric

Appears in 1 paper

In the paper, accuracy is the percentage of problems solved correctly out of a total.

As used in Paper 14 — Chain-of-Thought Prompting Elicits Reasoning in Large Language Models →

In the paper, accuracy is the percentage of problems solved correctly out of a total. Example: on GSM8K, PaLM achieved 25% accuracy with standard prompting and 58% with CoT. The improvement of 33 percentage points is considered massive in the field (>2× relative improvement). Accuracy is measured on held-out test sets where the model has not seen the specific problems during training.