Back to topics

Topic

Security & Robustness

Researchers studying adversarial behavior, model extraction, jailbreaks, robustness, and practical deployment risks.

Start with Nicholas Carlini, Pushmeet Kohli, Ethan Perez if you want the clearest first pass through security & robustness as it shows up in practice.

This area overlaps heavily with Anthropic, Google DeepMind, AI21. Common institution signals include Anthropic, Google DeepMind, Google. Recurring starting points include Constitutional AI: Harmlessness from AI Feedback, Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback.

Snapshot

Researchers

34

Related labs

4

Starting points

8

Developed dossiers

13

Institution Signals

Frequent institutions showing up across profiles in this area.

Anthropic (8)Google DeepMind (4)Google (3)OpenAI (3)AISLE (1)Center for AI Policy (1)Chapman University (1)Crisis24 (1)

Canonical Starting Points

Papers, project pages, and repositories that recur across this part of the field.

Frequently Linked Sources

Source clusters that repeatedly anchor researchers in this area.

Researchers To Start With

A stronger first pass through security & robustness, ranked by profile depth, evidence, and editorial importance.

Kenton Lee

NLP systems and evaluation

4 sources

A strong person to follow for practical language systems because his work sits right at the intersection of pretraining, retrieval, and question answering, where product-grade NLP systems either become robust or fall apart.

Start HereKenton Lee

All Researchers In This Topic

34 linked profiles.