Topic

Evaluation & Benchmarks

People building the measurement systems, benchmarks, and red-team style checks used to understand AI systems.

Start with Nicholas Carlini, Jared Kaplan, Dawn Drain if you want the clearest first pass through evaluation & benchmarks as it shows up in practice.

This area overlaps heavily with Anthropic, OpenAI, AI21. Common institution signals include Anthropic, AI21 Labs, Google DeepMind. Recurring starting points include Holistic Evaluation of Language Models, HELM (project).

Snapshot

Researchers

244

Related labs

Starting points

Developed dossiers

Angles To Understand

Useful entry points pulled from the strongest linked researcher dossiers.

Adversarial ML and extraction risks

Via Nicholas Carlini

Scaling laws for language models

Via Jared Kaplan

Assistant alignment research

Via Dawn Drain

Helpful and harmless assistant training

Via Danny Hernandez

Grounded language and multimodal learning

Via Angeliki Lazaridou

Applying frontier AI to science and public-interest problems

Via Pushmeet Kohli

Institution Signals

Frequent institutions showing up across profiles in this area.

Anthropic (35)AI21 Labs (11)Google DeepMind (8)Google (6)OpenAI (6)Stanford University (6)EleutherAI (4)Allen Institute (2)

Canonical Starting Points

Papers, project pages, and repositories that recur across this part of the field.

Holistic Evaluation of Language Models

Linked by 46 profiles in this topic

HELM (project)

Linked by 44 profiles in this topic

Evaluating Large Language Models Trained on Code

Linked by 39 profiles in this topic

Constitutional AI: Harmlessness from AI Feedback

Linked by 33 profiles in this topic

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Linked by 33 profiles in this topic

Discovering Language Model Behaviors with Model-Written Evaluations

Linked by 23 profiles in this topic

Discovering Language Model Behaviors with Model-Written Evaluations

Linked by 22 profiles in this topic

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Linked by 15 profiles in this topic

Frequently Linked Sources

Source clusters that repeatedly anchor researchers in this area.

HELM (project)

Used across 44 researcher pages in this topic

Holistic Evaluation of Language Models

Used across 44 researcher pages in this topic

Evaluating Large Language Models Trained on Code

Used across 37 researcher pages in this topic

Constitutional AI: Harmlessness from AI Feedback

Used across 33 researcher pages in this topic

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Used across 33 researcher pages in this topic

Discovering Language Model Behaviors with Model-Written Evaluations

Used across 20 researcher pages in this topic

Researchers To Start With

A stronger first pass through evaluation & benchmarks, ranked by profile depth, evidence, and editorial importance.

Nicholas Carlini

Adversarial ML, security of deployed models

4 sources

One of the most useful people to study if you care about what deployed models get wrong under pressure, especially around extraction, adversarial behavior, and practical security failures.

Post-Training & Alignment Evaluation & Benchmarks

Start HereTowards Evaluating the Robustness of Neural Networks

Jared Kaplan

Scaling laws, LLM training dynamics

3 sources

One of the clearest anchors for understanding why scaling laws became such a central planning tool for frontier-model research and training strategy.

Anthropic Evaluation & Benchmarks Reinforcement Learning

Start HereScaling Laws for Neural Language Models

Dawn Drain

Alignment via AI feedback (Constitutional AI)

5 sources

Useful for the seam between Anthropic’s earlier alignment papers and its later audit-oriented safety work, where interpretability and evaluation start feeding into deployment practice.

Anthropic Post-Training & Alignment Evaluation & Benchmarks

Start HereTraining a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Danny Hernandez

Alignment via AI feedback (Constitutional AI)

5 sources

A strong person to follow for how Anthropic moved from assistant training into more explicit evaluation work around model behavior, red-teaming, and chain-of-thought faithfulness.

Anthropic Post-Training & Alignment Evaluation & Benchmarks

Start HereTraining a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Angeliki Lazaridou

Gemini (multimodal foundation models)

4 sources

A high-signal researcher for grounded language and retrieval-heavy systems, especially if you want to understand how language models stay useful as the world changes around them.

Multimodal Evaluation & Benchmarks

Start HereAngeliki Lazaridou

Pushmeet Kohli

Robotics, vision, structured prediction

4 sources

A strong person to follow if you want to understand how frontier AI gets pushed into science, security, and trustworthy deployment rather than staying inside benchmark culture.

Google DeepMind Evaluation & Benchmarks Vision & Robotics

Start HereAccurate proteome-wide missense variant effect prediction with AlphaMissense

Christopher D. Manning

NLP, language understanding

4 sources

A foundational NLP researcher whose work matters both for classic representation learning and for institution-building around the modern Stanford NLP ecosystem.

Evaluation & Benchmarks

Start HereChristopher Manning

Deep Ganguli

Alignment via AI feedback (Constitutional AI)

4 sources

Important because his work sits near the point where technical alignment, evaluation practice, and the public case for safer frontier-model deployment meet.

Anthropic Post-Training & Alignment Evaluation & Benchmarks

Start HereConstitutional AI: Harmlessness from AI Feedback

Anish Thite

Open-source LLMs (EleutherAI)

5 sources

Useful to follow if you care about the practical evaluation layer of open models, especially where benchmark tooling and reproducible comparisons actually shape what the ecosystem measures.

EleutherAI Open Models Evaluation & Benchmarks

Start HereAnish Thite