Tag: 长程推理

All the articles with the tag "长程推理".

7.0
Revisiting On-Policy Distillation：实证失败模式与简单修复
2026年03月27日
· cs.CL updates on arXiv.org· 03/27 12:31 采集
重新审视 OPD 在长程场景下的脆弱性，揭示采样 token 变体将分布匹配简化为单 token 信号的系统性问题。