reward modeling
2 articles · 3 co-occurring · 1 contradictions · 5 briefs
Rubric-based rewards break down desired model behavior into clear criteria that LLM judges use to give better feedback. This method improves reinforcement learning by making rewards more reliable, esp
[STRONG] "Its scalar reward-driven architecture leads to reward hacking and poor robustness" — Article argues reward optimization mechanisms are fundamentally flawed and lead to alignment problems
Rubric-based rewards break down desired model behavior into clear criteria that LLM judges use to give better feedback. This method improves reinforcement learning by making rewards more reliable, esp
Its scalar reward-driven architecture leads to reward hacking and poor robustness" — Article argues reward optimization mechanisms are fundamentally flawed and lead to alignment problems