Reason Only When Needed: Efficient Generative Reward Modeling via Model-Internal Uncertainty

发布

2026年04月14日

采集 2026年04月14日 04:31

学术前沿 5.7 分 — 中等偏上：有一定信息增量和参考价值

评分 5.7 · 来源：cs.CL updates on arXiv.org · 发布于 2026-04-14

评分依据：中等偏上：有一定信息增量和参考价值

Reason Only When Needed: Efficient Generative Reward Modeling via Model-Internal Uncertainty

arXiv:2604.10072v1 Announce Type: new Abstract: Recent advancements in the Generative Reward Model (GRM) have demonstrated its potential to enhance the reasoning abilities of LLMs through Chain-of-Thought (CoT) prompting. Despite these gains, existing implementations of GRM suffer from two critical limitations. First, CoT prompting is applied indiscriminately to all inputs regardless of their inherent complexity. This introduces unnecessary computational costs for tasks amenable to fast, direct…