Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective

发布

2026年04月29日

采集 2026年04月29日 06:31

学术前沿 5.5 分 — 从熵变化角度重新审视 RLVR 熵崩溃问题及启发式干预，对 reasoning model 训练有指导意义

原文： arXiv cs.LG

评分 5.5 · 来源：arXiv cs.LG · 发布于 2026-04-29

评分依据：从熵变化角度重新审视 RLVR 熵崩溃问题及启发式干预，对 reasoning model 训练有指导意义

RLVR 是提升 LLM 推理能力的基石技术，但训练常受 entropy collapse 困扰——策略熵快速下降限制探索。虽然近期工作提出多种启发式熵干预，其底层机制尚不清楚。本文从熵变化角度提供新视角。

标签：

Nautile-370M: Spectral Memory Meets Attention in a Small Reasoning Model

How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum