Rafael Rafailov

Direct preference optimization (DPO)

Co-authored the DPO paper: a simple alternative to reward-model + RLHF loops.

Highlights

DPOPreferencesPost-trainingAlignment

Focus: Direct preference optimization (DPO)

Why it matters: Co-authored the DPO paper: a simple alternative to reward-model + RLHF loops.

Start here

DPOPreferencesPost-trainingAlignment

Rafael Rafailov - AI Researcher Profile | 500AI