Back to researchers

Saffron Huang

Red teaming with language models

Co-authored Red Teaming LMs with LMs: a concrete approach to stress-testing model behavior at scale.

Highlights

SafetyRed teamingEvaluation
Focus: Red teaming with language models
Why it matters: Co-authored Red Teaming LMs with LMs: a concrete approach to stress-testing model behavior at scale.

Research Areas

SafetyRed teamingEvaluation
Saffron Huang - AI Researcher Profile | 500AI