Advancing Reasoning in Diffusion Language Models with Denoising Process Rewards

发布

2026年04月14日

采集 2026年04月14日 04:31

学术前沿 5.7 分 — 中等偏上：有一定信息增量和参考价值

评分 5.7 · 来源：cs.AI updates on arXiv.org · 发布于 2026-04-14

评分依据：中等偏上：有一定信息增量和参考价值

Advancing Reasoning in Diffusion Language Models with Denoising Process Rewards

arXiv:2510.01544v2 Announce Type: replace Abstract: Diffusion-based large language models offer a non-autoregressive alternative for text generation, but enabling them to perform complex reasoning remains challenging. Reinforcement learning has recently emerged as an effective post-training strategy for improving their performance; however, existing methods rely primarily on outcome-based rewards, which provide no direct supervision over the denoising process and often result in poorly…