Back to researchers

Jacob Steinhardt

Broad capability evaluation (MMLU)

Co-authored MMLU: a widely used benchmark for general LLM capability across many subjects.

Highlights

EvaluationBenchmarksLLMs
Focus: Broad capability evaluation (MMLU)
Why it matters: Co-authored MMLU: a widely used benchmark for general LLM capability across many subjects.

Research Areas

EvaluationBenchmarksLLMs
Jacob Steinhardt - AI Researcher Profile | 500AI