Back to researchers

Neel Nanda

Training helpful, harmless assistants via RLHF

Co-authored an early RLHF recipe for helpful + harmless assistants.

Highlights

AnthropicRLHFAlignmentPost-training
Focus: Training helpful, harmless assistants via RLHF
Why it matters: Co-authored an early RLHF recipe for helpful + harmless assistants.

Research Areas

AnthropicRLHFAlignmentPost-training
Neel Nanda - AI Researcher Profile | 500AI