Adversarial ML, security of deployed models
One of the most useful people to study if you care about what deployed models get wrong under pressure, especially around extraction, adversarial behavior, and practical security failures.
Topic
People building the measurement systems, benchmarks, and red-team style checks used to understand AI systems.
Start with Nicholas Carlini, Jared Kaplan, Dawn Drain if you want the clearest first pass through evaluation & benchmarks as it shows up in practice.
This area overlaps heavily with Anthropic, OpenAI, AI21. Common institution signals include Anthropic, AI21 Labs, Google DeepMind. Recurring starting points include Holistic Evaluation of Language Models, HELM (project).
Snapshot
Researchers
244
Related labs
7
Starting points
8
Developed dossiers
43
Useful entry points pulled from the strongest linked researcher dossiers.
Adversarial ML and extraction risks
Via Nicholas Carlini
Scaling laws for language models
Via Jared Kaplan
Assistant alignment research
Via Dawn Drain
Helpful and harmless assistant training
Via Danny Hernandez
Grounded language and multimodal learning
Via Angeliki Lazaridou
Applying frontier AI to science and public-interest problems
Via Pushmeet Kohli
Frequent institutions showing up across profiles in this area.
Papers, project pages, and repositories that recur across this part of the field.
Holistic Evaluation of Language Models
46Linked by 46 profiles in this topic
HELM (project)
44Linked by 44 profiles in this topic
Evaluating Large Language Models Trained on Code
39Linked by 39 profiles in this topic
Constitutional AI: Harmlessness from AI Feedback
33Linked by 33 profiles in this topic
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
33Linked by 33 profiles in this topic
Discovering Language Model Behaviors with Model-Written Evaluations
23Linked by 23 profiles in this topic
Discovering Language Model Behaviors with Model-Written Evaluations
22Linked by 22 profiles in this topic
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
15Linked by 15 profiles in this topic
Source clusters that repeatedly anchor researchers in this area.
HELM (project)
44Used across 44 researcher pages in this topic
Holistic Evaluation of Language Models
44Used across 44 researcher pages in this topic
Evaluating Large Language Models Trained on Code
37Used across 37 researcher pages in this topic
Constitutional AI: Harmlessness from AI Feedback
33Used across 33 researcher pages in this topic
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
33Used across 33 researcher pages in this topic
Discovering Language Model Behaviors with Model-Written Evaluations
20Used across 20 researcher pages in this topic
A stronger first pass through evaluation & benchmarks, ranked by profile depth, evidence, and editorial importance.
Adversarial ML, security of deployed models
One of the most useful people to study if you care about what deployed models get wrong under pressure, especially around extraction, adversarial behavior, and practical security failures.
Scaling laws, LLM training dynamics
One of the clearest anchors for understanding why scaling laws became such a central planning tool for frontier-model research and training strategy.
Alignment via AI feedback (Constitutional AI)
Useful for the seam between Anthropic’s earlier alignment papers and its later audit-oriented safety work, where interpretability and evaluation start feeding into deployment practice.
Alignment via AI feedback (Constitutional AI)
A strong person to follow for how Anthropic moved from assistant training into more explicit evaluation work around model behavior, red-teaming, and chain-of-thought faithfulness.
Gemini (multimodal foundation models)
A high-signal researcher for grounded language and retrieval-heavy systems, especially if you want to understand how language models stay useful as the world changes around them.
Robotics, vision, structured prediction
A strong person to follow if you want to understand how frontier AI gets pushed into science, security, and trustworthy deployment rather than staying inside benchmark culture.
NLP, language understanding
A foundational NLP researcher whose work matters both for classic representation learning and for institution-building around the modern Stanford NLP ecosystem.
Alignment via AI feedback (Constitutional AI)
Important because his work sits near the point where technical alignment, evaluation practice, and the public case for safer frontier-model deployment meet.
Open-source LLMs (EleutherAI)
Useful to follow if you care about the practical evaluation layer of open models, especially where benchmark tooling and reproducible comparisons actually shape what the ecosystem measures.
244 linked profiles.