Back to researchers

Da Yan

Model-written evaluations for LM behavior

Co-authored model-written evals: a practical technique for discovering and measuring LM behaviors.

Highlights

AnthropicEvaluationSafetyAlignment
Focus: Model-written evaluations for LM behavior
Why it matters: Co-authored model-written evals: a practical technique for discovering and measuring LM behaviors.

Research Areas

AnthropicEvaluationSafetyAlignment
Da Yan - AI Researcher Profile | 500AI