Back to researchers

Sainbayar Sukhbaatar

Self-rewarding post-training

Co-authored Self-Rewarding Language Models: explores self-improvement via internal reward modeling.

Highlights

Post-trainingAlignmentPreferences
Focus: Self-rewarding post-training
Why it matters: Co-authored Self-Rewarding Language Models: explores self-improvement via internal reward modeling.

Research Areas

Post-trainingAlignmentPreferences
Sainbayar Sukhbaatar - AI Researcher Profile | 500AI