Back to researchers

Kexin Pei

Measuring real-world coding ability (SWE-bench)

Co-authored SWE-bench: a key benchmark for whether models can resolve real GitHub issues.

Highlights

EvaluationSWE-benchCodeAgents
Focus: Measuring real-world coding ability (SWE-bench)
Why it matters: Co-authored SWE-bench: a key benchmark for whether models can resolve real GitHub issues.

Research Areas

EvaluationSWE-benchCodeAgents
Kexin Pei - AI Researcher Profile | 500AI