评分依据:Important alignment problem: eliciting best work from capable models under weak supervision. Relevant to AI safety deployment.
Removing Sandbagging in LLMs by Training with Weak Supervision
发布
采集
学术前沿 7.5 分
— Important alignment problem: eliciting best work from capable models under weak supervision. Relevant to AI safety deployment. 原文: arxiv.org