Rethinking Token-Level Credit Assignment in RLVR: A Polarity-Entropy Analysis

发布

2026年04月14日

采集 2026年04月14日 04:31

学术前沿 5.8 分 — 中等偏上：有一定信息增量和参考价值

评分 5.8 · 来源：cs.AI updates on arXiv.org · 发布于 2026-04-14

评分依据：中等偏上：有一定信息增量和参考价值

Rethinking Token-Level Credit Assignment in RLVR: A Polarity-Entropy Analysis

arXiv:2604.11056v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has substantially improved the reasoning ability of Large Language Models (LLMs). However, its sparse outcome-based rewards pose a fundamental credit assignment problem. We analyze this problem through the joint lens of reward polarity and token entropy. Our diagnostic tool, the Four Quadrant Decomposition, isolates token updates by polarity and entropy, and controlled ablations show that…