Back to researchers

Miljan Martic

Practical RL from human feedback

Co-authored Deep RL from Human Preferences: an early anchor for RLHF-style post-training.

Highlights

RLHFAlignmentPreference learning
Focus: Practical RL from human feedback
Why it matters: Co-authored Deep RL from Human Preferences: an early anchor for RLHF-style post-training.

Research Areas

RLHFAlignmentPreference learning
Miljan Martic - AI Researcher Profile | 500AI