MMLU (Massive Multitask Language Understanding)
A benchmark of 57 diverse academic subjects (history, law, science, medicine) with 14,042 multiple-choice questions.
A benchmark of 57 diverse academic subjects (history, law, science, medicine) with 14,042 multiple-choice questions. Baseline: 70% random, 86.4% GPT-4, 89.8% human expert, 90.04% Gemini Ultra.