Skip to content
星际流动

Bootstrapping Post-training Signals for Open-ended Tasks via Rubric-based Self-play on Pre-training Text

发布
采集
行业动态 6.0 分 — Creative use of pre-training text as self-play signal source. Relevant to reducing RL data requirements.
原文: arXiv

评分 6.0 · 来源: · 发布于

评分依据:Creative use of pre-training text as self-play signal source. Relevant to reducing RL data requirements.