Back to researchers
Kexin Pei
Measuring real-world coding ability (SWE-bench)
Co-authored SWE-bench: a key benchmark for whether models can resolve real GitHub issues.
Highlights
EvaluationSWE-benchCodeAgents
Focus: Measuring real-world coding ability (SWE-bench)
Why it matters: Co-authored SWE-bench: a key benchmark for whether models can resolve real GitHub issues.
Research Areas
EvaluationSWE-benchCodeAgents