Back to researchers

Archit Sharma

Direct preference optimization (DPO)

Co-authored the DPO paper: a simple alternative to reward-model + RLHF loops.

Highlights

DPOPreferencesPost-trainingAlignment
Focus: Direct preference optimization (DPO)
Why it matters: Co-authored the DPO paper: a simple alternative to reward-model + RLHF loops.

Research Areas

DPOPreferencesPost-trainingAlignment
Archit Sharma - AI Researcher Profile | 500AI