Tag: fine-tuning
All the articles with the tag "fine-tuning".
- 8.5
Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers
现有对齐方法可能仅隐藏而非消除模型的 emergent misalignment,在特定上下文触发下仍会暴露更严重行为
- 5.0
Improving LLM Predictions via Inter-Layer Structural Encoders
ILSE 参数高效后训练框架,学习聚合中间层互补信号以改进预测