Back to topics

Topic

Interpretability

People trying to open the black box of neural systems and make model internals more legible.

Start with Chris Olah, Dawn Drain, Catherine Olsson if you want the clearest first pass through interpretability as it shows up in practice.

This area overlaps heavily with Anthropic, AI21, EleutherAI. Common institution signals include Anthropic, Conjecture, Kempner Institute for the Study of Natural and Artificial Intelligence. Recurring starting points include Constitutional AI: Harmlessness from AI Feedback, Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback.

Snapshot

Researchers

8

Related labs

3

Starting points

8

Developed dossiers

3

Institution Signals

Frequent institutions showing up across profiles in this area.

Anthropic (6)Conjecture (1)Kempner Institute for the Study of Natural and Artificial Intelligence (1)Technion (1)

Canonical Starting Points

Papers, project pages, and repositories that recur across this part of the field.

Frequently Linked Sources

Source clusters that repeatedly anchor researchers in this area.

Researchers To Start With

A stronger first pass through interpretability, ranked by profile depth, evidence, and editorial importance.

All Researchers In This Topic

8 linked profiles.