PAINT: Partial-Solution Adaptive Interpolated Training for Self-Distilled Reasoners

发布

2026年04月30日

采集 2026年04月30日 06:33

学术前沿 6.0 分 — Combines privileged on-policy info with adaptive interpolation for reasoner training, addresses credit assignment challenge

原文： cs.LG updates on arXiv.org

评分 6 · 来源：cs.LG updates on arXiv.org · 发布于 2026-04-30

评分依据：Combines privileged on-policy info with adaptive interpolation for reasoner training, addresses credit assignment challenge

Lyapunov-Guided Self-Alignment: Test-Time Adaptation for Offline Safe RL

Unifying Sparse Attention with Hierarchical Memory for Scalable Long-Context LLM Serving