Tag: fine-tuning

All the articles with the tag "fine-tuning".

8.5
Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers
2026年04月29日
· arXiv cs.LG· 04/29 14:31 采集
现有对齐方法可能仅隐藏而非消除模型的 emergent misalignment，在特定上下文触发下仍会暴露更严重行为
5.0
Improving LLM Predictions via Inter-Layer Structural Encoders
2026年04月29日
· arXiv cs.LG· 04/29 14:31 采集
ILSE 参数高效后训练框架，学习聚合中间层互补信号以改进预测

Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers