Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents

发布

2026年04月14日

采集 2026年04月14日 04:31

学术前沿 6.0 分 — Agent self-distillation with skill conditioning addresses real RL sample efficiency issues in multi-turn agents. Relevant to agent development community.

原文： cs.AI updates on arXiv.org

评分 6 · 来源：cs.AI updates on arXiv.org · 发布于 2026-04-14

评分依据：Agent self-distillation with skill conditioning addresses real RL sample efficiency issues in multi-turn agents. Relevant to agent development community.

Efficient Process Reward Modeling via Contrastive Mutual Information

Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models