Critical-CoT: A Robust Defense Framework against Reasoning-Level Backdoor Attacks in Large Language Models

发布

2026年04月14日

采集 2026年04月14日 04:31

学术前沿 6.0 分 — 中等偏上：有一定信息增量和参考价值

评分 6.0 · 来源：cs.AI updates on arXiv.org · 发布于 2026-04-14

评分依据：中等偏上：有一定信息增量和参考价值

Critical-CoT: A Robust Defense Framework against Reasoning-Level Backdoor Attacks in Large Language Models

arXiv:2604.10681v1 Announce Type: cross Abstract: Large Language Models (LLMs), despite their impressive capabilities across domains, have been shown to be vulnerable to backdoor attacks. Prior backdoor strategies predominantly operate at the token level, where an injected trigger causes the model to generate a specific target word, choice, or class (depending on the task). Recent advances, however, exploit the long-form reasoning tendencies of modern LLMs to conduct reasoning-level backdoors:…