Back to researchers
Jacob Steinhardt
Broad capability evaluation (MMLU)
Co-authored MMLU: a widely used benchmark for general LLM capability across many subjects.
Highlights
EvaluationBenchmarksLLMs
Focus: Broad capability evaluation (MMLU)
Why it matters: Co-authored MMLU: a widely used benchmark for general LLM capability across many subjects.
Research Areas
EvaluationBenchmarksLLMs