Back to researchers
Shane Legg
Practical RL from human feedback
Co-authored Deep RL from Human Preferences: an early anchor for RLHF-style post-training.
Highlights
RLHFAlignmentPreference learning
Focus: Practical RL from human feedback
Why it matters: Co-authored Deep RL from Human Preferences: an early anchor for RLHF-style post-training.
Research Areas
RLHFAlignmentPreference learning