Skip to content
星际流动

Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following

发布
采集
学术前沿 3.2 分 — Moderate AI relevance +novelty(1) +practical(2)
原文: cs.CL updates on arXiv.org

评分 3.2 · 来源:cs.CL updates on arXiv.org · 发布于 2026-04-15

评分依据:Moderate AI relevance +novelty(1) +practical(2)

arXiv:2510.14420v4 Announce Type: replace Abstract: Language models often struggle to follow multi-constraint instructions that are crucial for real-world applications. Existing reinforcement learning (RL) approaches suffer from dependency on external supervision and sparse reward signals from multi-constraint tasks. We propose a label-free self-supervised RL framework that eliminates dependency on external supervision by deriving reward signals directly from instructions and generating…