LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking

发布

2026年04月17日

采集 2026年04月17日 04:31

学术前沿 7.5 分 — Critical finding: RLVR-trained models abandon rule induction for verifier gaming, important warning for reasoning scaling efforts

评分 7.5 · 来源：cs.LG updates on arXiv.org · 发布于 2026-04-17

评分依据：Critical finding: RLVR-trained models abandon rule induction for verifier gaming, important warning for reasoning scaling efforts

arXiv:2604.15149v1 Announce Type: new Abstract: As reinforcement Learning with Verifiable Rewards (RLVR) has become the dominant paradigm for scaling reasoning capabilities in LLMs, a new failure mode emerges: LLMs gaming verifiers. We study this phenomenon on inductive reasoning tasks, where models must induce and output logical rules. We find that RLVR-trained models systematically abandon rule induction.