Back to topics

Topic

Evaluation & Benchmarks

People building the measurement systems, benchmarks, and red-team style checks used to understand AI systems.

Start with Nicholas Carlini, Jared Kaplan, Dawn Drain if you want the clearest first pass through evaluation & benchmarks as it shows up in practice.

This area overlaps heavily with Anthropic, OpenAI, AI21. Common institution signals include Anthropic, AI21 Labs, Google DeepMind. Recurring starting points include Holistic Evaluation of Language Models, HELM (project).

Snapshot

Researchers

244

Related labs

7

Starting points

8

Developed dossiers

43

Institution Signals

Frequent institutions showing up across profiles in this area.

Anthropic (35)AI21 Labs (11)Google DeepMind (8)Google (6)OpenAI (6)Stanford University (6)EleutherAI (4)Allen Institute (2)

Canonical Starting Points

Papers, project pages, and repositories that recur across this part of the field.

Frequently Linked Sources

Source clusters that repeatedly anchor researchers in this area.

Researchers To Start With

A stronger first pass through evaluation & benchmarks, ranked by profile depth, evidence, and editorial importance.

All Researchers In This Topic

244 linked profiles.