Back to researchers
Roger Grosse
Model-written evaluations for LM behavior
Co-authored model-written evals: a practical technique for discovering and measuring LM behaviors.
Highlights
AnthropicEvaluationSafetyAlignment
Focus: Model-written evaluations for LM behavior
Why it matters: Co-authored model-written evals: a practical technique for discovering and measuring LM behaviors.
Research Areas
AnthropicEvaluationSafetyAlignment