Back to researchers
Saffron Huang
Red teaming with language models
Co-authored Red Teaming LMs with LMs: a concrete approach to stress-testing model behavior at scale.
Highlights
SafetyRed teamingEvaluation
Focus: Red teaming with language models
Why it matters: Co-authored Red Teaming LMs with LMs: a concrete approach to stress-testing model behavior at scale.
Research Areas
SafetyRed teamingEvaluation