评分 6.3 · 来源:cs.AI updates on arXiv.org · 发布于 2026-04-08
评分依据:情绪刺激对LLM安全对齐的影响
arXiv:2604.04992v1 Announce Type: cross Abstract: Safety-aligned LLMs go through refusal training to reject harmful requests, but whether these mechanisms remain effective under emotionally charged stimuli is unexplored. We introduce FreakOut-LLM, a framework investigating whether emotional context compromises safety alignment in adversarial settings. Using validated psychological stimuli, we evaluate how emotional priming through system prompts affects jailbreak susceptibility across ten LLMs. We test three conditions (stress, relaxation, neutral) using scenarios from established psychologica