Skip to content
星际流动

Removing Sandbagging in LLMs by Training with Weak Supervision

发布
采集
学术前沿 7.5 分 — Important alignment problem: eliciting best work from capable models under weak supervision. Relevant to AI safety deployment.
原文: arxiv.org

评分 7.5 · 来源: · 发布于 2026-04-27

评分依据:Important alignment problem: eliciting best work from capable models under weak supervision. Relevant to AI safety deployment.