Alignment, post-training, frontier LLMs
A high-signal figure for understanding the frontier model era because his work sits at the intersection of scaling, post-training, and deployment-risk framing.
Lab & Ecosystem
Alignment, post-training, and frontier assistant researchers with a strong safety and behavior focus.
Within 500AI, Anthropic is most legible through researchers like Dario Amodei, Amanda Askell, Jack Clark.
This cluster is especially tied to Post-Training & Alignment, Evaluation & Benchmarks, Reinforcement Learning. Frequent institution signals include Anthropic, Google, AISLE. Recurring entry points include Constitutional AI: Harmlessness from AI Feedback, Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback.
Snapshot
Researchers
71
Related topics
8
Starting points
8
Developed dossiers
14
Useful lenses pulled from the strongest researcher profiles in this cluster.
Frontier-model scaling and deployment tradeoffs
Via Dario Amodei
Behavior shaping in large models
Via Amanda Askell
Frontier-lab analysis
Via Jack Clark
Scaling laws for language models
Via Jared Kaplan
Chip placement with deep reinforcement learning
Via Azalia Mirhoseini
Assistant alignment research
Via Dawn Drain
Frequent institutions showing up across linked profiles in this ecosystem.
Repeatedly linked papers, projects, and repositories across this lab cluster.
Constitutional AI: Harmlessness from AI Feedback
49Linked by 49 profiles in this cluster
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
47Linked by 47 profiles in this cluster
Discovering Language Model Behaviors with Model-Written Evaluations
23Linked by 23 profiles in this cluster
Discovering Language Model Behaviors with Model-Written Evaluations
22Linked by 22 profiles in this cluster
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
20Linked by 20 profiles in this cluster
Challenges in evaluating AI systems
9Linked by 9 profiles in this cluster
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
9Linked by 9 profiles in this cluster
Measuring Faithfulness in Chain-of-Thought Reasoning
8Linked by 8 profiles in this cluster
Source clusters that repeatedly anchor researcher pages in this ecosystem.
Constitutional AI: Harmlessness from AI Feedback
48Used across 48 researcher pages in this lab cluster
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
47Used across 47 researcher pages in this lab cluster
Discovering Language Model Behaviors with Model-Written Evaluations
20Used across 20 researcher pages in this lab cluster
Jack Clark (website)
1Used across 1 researcher pages in this lab cluster
Scaling Laws for Neural Language Models
1Used across 1 researcher pages in this lab cluster
A stronger first pass through Anthropic, ranked by profile depth, evidence, and editorial importance.
Alignment, post-training, frontier LLMs
A high-signal figure for understanding the frontier model era because his work sits at the intersection of scaling, post-training, and deployment-risk framing.
Alignment, behavior shaping, safety
A high-signal researcher for understanding how post-training and behavioral steering become concrete product behavior rather than abstract alignment talk.
AI policy, frontier-lab strategy, analysis
Useful not just for his own technical work, but because he consistently translates frontier research, deployment shifts, and policy implications into a coherent field-level picture.
Scaling laws, LLM training dynamics
One of the clearest anchors for understanding why scaling laws became such a central planning tool for frontier-model research and training strategy.
Alignment via AI feedback (Constitutional AI)
High-signal for the seam between machine learning and hardware systems, especially where learned optimization methods begin affecting the actual compute infrastructure underneath frontier models.
Alignment via AI feedback (Constitutional AI)
Useful for the seam between Anthropic’s earlier alignment papers and its later audit-oriented safety work, where interpretability and evaluation start feeding into deployment practice.
Alignment via AI feedback (Constitutional AI)
A strong person to follow for the point where machine learning research starts shaping the compute stack itself, especially in chip placement and systems-aware optimization.
Alignment via AI feedback (Constitutional AI)
One of the clearest people to follow if you want the mechanistic-interpretability thread at Anthropic rather than only its safety-policy surface.
Alignment via AI feedback (Constitutional AI)
A strong person to follow for how Anthropic moved from assistant training into more explicit evaluation work around model behavior, red-teaming, and chain-of-thought faithfulness.
71 linked profiles.