Topic

Reinforcement Learning

Researchers working on decision-making, planning, self-play, and RL methods that still shape modern AI systems.

Start with Demis Hassabis, Chris Olah, Dario Amodei if you want the clearest first pass through reinforcement learning as it shows up in practice.

This area overlaps heavily with Anthropic, Google DeepMind, AI21. Common institution signals include Anthropic, Google DeepMind, Google. Recurring starting points include Constitutional AI: Harmlessness from AI Feedback, Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback.

Snapshot

Researchers

Related labs

Starting points

Developed dossiers

Angles To Understand

Useful entry points pulled from the strongest linked researcher dossiers.

Deep reinforcement learning

Via Demis Hassabis

Feature visualization and interpretability

Via Chris Olah

Frontier-model scaling and deployment tradeoffs

Via Dario Amodei

Behavior shaping in large models

Via Amanda Askell

Reward modeling

Via Paul Christiano

Policy optimization and reinforcement learning

Via John Schulman

Institution Signals

Frequent institutions showing up across profiles in this area.

Anthropic (48)Google DeepMind (12)Google (7)Meta (2)AISLE (1)Alignment Research Center (1)Center for AI Policy (1)Istituto Nazionale di Fisica Nucleare, Sezione di Pisa (1)

Canonical Starting Points

Papers, project pages, and repositories that recur across this part of the field.

Constitutional AI: Harmlessness from AI Feedback

Linked by 48 profiles in this topic

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Linked by 47 profiles in this topic

Discovering Language Model Behaviors with Model-Written Evaluations

Linked by 22 profiles in this topic

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Linked by 20 profiles in this topic

Challenges in evaluating AI systems

Linked by 9 profiles in this topic

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

Linked by 9 profiles in this topic

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Linked by 8 profiles in this topic

Measuring Faithfulness in Chain-of-Thought Reasoning

Linked by 8 profiles in this topic

Frequently Linked Sources

Source clusters that repeatedly anchor researchers in this area.

Constitutional AI: Harmlessness from AI Feedback

Used across 48 researcher pages in this topic

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Used across 47 researcher pages in this topic

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Used across 6 researcher pages in this topic

Deep Reinforcement Learning from Human Preferences

Used across 4 researcher pages in this topic

Playing Atari with Deep Reinforcement Learning

Used across 4 researcher pages in this topic

Reflexion: Language Agents with Verbal Reinforcement Learning

Used across 4 researcher pages in this topic

Researchers To Start With

A stronger first pass through reinforcement learning, ranked by profile depth, evidence, and editorial importance.

Demis Hassabis

Deep RL, scientific AI, leadership

4 sources

Important both as a researcher and as an institution builder whose long-running agenda tied deep RL, multimodal systems, and scientific AI into one coherent lab strategy.

Google DeepMind Multimodal Systems & Infrastructure

Start HereGoogle DeepMind

Chris Olah

Mechanistic interpretability, visualization

4 sources

One of the clearest interpreters of neural-network internals, especially in the line of work that turned interpretability into a concrete research agenda rather than a vague aspiration.

Post-Training & Alignment Reinforcement Learning

Start HereFeature Visualization

Dario Amodei

Alignment, post-training, frontier LLMs

3 sources

A high-signal figure for understanding the frontier model era because his work sits at the intersection of scaling, post-training, and deployment-risk framing.

Anthropic Post-Training & Alignment Reinforcement Learning

Start HereAnthropic company

Amanda Askell

Alignment, behavior shaping, safety

3 sources

A high-signal researcher for understanding how post-training and behavioral steering become concrete product behavior rather than abstract alignment talk.

Anthropic Post-Training & Alignment Reinforcement Learning

Start HereClaude's Constitution

Paul Christiano

Alignment theory, reward modeling

3 sources

A foundational thinker in oversight, reward modeling, and delegation-style alignment ideas that influenced much of the modern post-training conversation.

Post-Training & Alignment Reinforcement Learning

Start HereDeep Reinforcement Learning from Human Preferences

John Schulman

Reinforcement learning, post-training

3 sources

A key bridge between reinforcement-learning methodology and the post-training techniques now used to shape assistant behavior.

OpenAI Post-Training & Alignment Reinforcement Learning

Start HereProximal Policy Optimization Algorithms

Jared Kaplan

Scaling laws, LLM training dynamics

3 sources

One of the clearest anchors for understanding why scaling laws became such a central planning tool for frontier-model research and training strategy.

Anthropic Evaluation & Benchmarks Reinforcement Learning

Start HereScaling Laws for Neural Language Models

Jan Leike

Alignment research, scalable oversight

3 sources

One of the clearest public anchors for scalable oversight and alignment research in the frontier-model era.

Post-Training & Alignment Agents & Reasoning

Start HereScalable agent alignment via reward modeling

Azalia Mirhoseini

Alignment via AI feedback (Constitutional AI)

5 sources

High-signal for the seam between machine learning and hardware systems, especially where learned optimization methods begin affecting the actual compute infrastructure underneath frontier models.

Anthropic Post-Training & Alignment Systems & Infrastructure

Start HereChip Design with Deep Reinforcement Learning