Back to researchers
Neel Nanda
Training helpful, harmless assistants via RLHF
Co-authored an early RLHF recipe for helpful + harmless assistants.
Highlights
AnthropicRLHFAlignmentPost-training
Focus: Training helpful, harmless assistants via RLHF
Why it matters: Co-authored an early RLHF recipe for helpful + harmless assistants.
Research Areas
AnthropicRLHFAlignmentPost-training