评分 5.7 · 来源:cs.CL updates on arXiv.org · 发布于 2026-04-14
评分依据:中等偏上:有一定信息增量和参考价值
Reason Only When Needed: Efficient Generative Reward Modeling via Model-Internal Uncertainty
arXiv:2604.10072v1 Announce Type: new Abstract: Recent advancements in the Generative Reward Model (GRM) have demonstrated its potential to enhance the reasoning abilities of LLMs through Chain-of-Thought (CoT) prompting. Despite these gains, existing implementations of GRM suffer from two critical limitations. First, CoT prompting is applied indiscriminately to all inputs regardless of their inherent complexity. This introduces unnecessary computational costs for tasks amenable to fast, direct…