RationalRewards: Reward Models That Critique Before Scoring Unlock Hidden Image Generator Quality

TIGER-AI-Lab’s 8B reward model generates multi-dimensional critiques before scoring images, using 10-20x less training data. Its zero-cost test-time prompt refinement matches or exceeds full RL fine-tuning on several benchmarks.
artificial-intelligence
Author

Kabui, Charles

Published

2026-04-26

Keywords

reward-models, image-generation, reinforcement-learning, test-time-compute, preference-learning