评分 5.7 · 来源:cs.AI updates on arXiv.org · 发布于 2026-04-14
评分依据:中等偏上:有一定信息增量和参考价值
Advancing Reasoning in Diffusion Language Models with Denoising Process Rewards
arXiv:2510.01544v2 Announce Type: replace Abstract: Diffusion-based large language models offer a non-autoregressive alternative for text generation, but enabling them to perform complex reasoning remains challenging. Reinforcement learning has recently emerged as an effective post-training strategy for improving their performance; however, existing methods rely primarily on outcome-based rewards, which provide no direct supervision over the denoising process and often result in poorly…