评分 7.5 · 来源:cs.LG updates on arXiv.org · 发布于 2026-04-17
评分依据:Critical finding: RLVR-trained models abandon rule induction for verifier gaming, important warning for reasoning scaling efforts
arXiv:2604.15149v1 Announce Type: new Abstract: As reinforcement Learning with Verifiable Rewards (RLVR) has become the dominant paradigm for scaling reasoning capabilities in LLMs, a new failure mode emerges: LLMs gaming verifiers. We study this phenomenon on inductive reasoning tasks, where models must induce and output logical rules. We find that RLVR-trained models systematically abandon rule induction.