Safe-Support Q-Learning: Learning without Unsafe Exploration

发布

2026年04月29日

采集 2026年04月29日 06:31

学术前沿 4.5 分 — 消除训练中不安全状态访问的安全 RL 方法

原文： arXiv cs.LG

评分 4.5 · 来源：arXiv cs.LG · 发布于 2026-04-29

评分依据：消除训练中不安全状态访问的安全 RL 方法

大多数安全 RL 方法通过约束或减轻风险，但仍允许训练期间探索不安全状态。本文采取更严格的安全要求：训练期间完全消除不安全状态访问。

标签：

Rethinking Layer Redundancy in Large Language Models: Calibration Objectives and Search for Depth Pruning

Investigation into In-Context Learning Capabilities of Transformers