Back to researchers
Rafael Rafailov
Direct preference optimization (DPO)
Co-authored the DPO paper: a simple alternative to reward-model + RLHF loops.
Highlights
DPOPreferencesPost-trainingAlignment
Focus: Direct preference optimization (DPO)
Why it matters: Co-authored the DPO paper: a simple alternative to reward-model + RLHF loops.
Research Areas
DPOPreferencesPost-trainingAlignment