Tag: jailbreak
All the articles with the tag "jailbreak".
- 7.0
When Safety Fails Before the Answer: Benchmarking Harmful Behavior Detection in Reasoning Chains
首个在推理链层面检测有害行为的benchmark,捕捉jailbreak过程中从抑制拒绝到掩盖风险的完整行为链条
All the articles with the tag "jailbreak".
首个在推理链层面检测有害行为的benchmark,捕捉jailbreak过程中从抑制拒绝到掩盖风险的完整行为链条