The Model Agreed, But Didn't Learn: Diagnosing Surface Compliance in Large Language Models

发布

2026年04月08日

采集 2026年04月08日 04:31

学术前沿 6.3 分 — LLM表面合规诊断：模型同意但未真正学习

评分 6.3 · 来源：cs.AI updates on arXiv.org · 发布于 2026-04-08

评分依据：LLM表面合规诊断：模型同意但未真正学习

arXiv:2604.05995v1 Announce Type: cross Abstract: Large Language Models (LLMs) internalize vast world knowledge as parametric memory, yet inevitably inherit the staleness and errors of their source corpora. Consequently, ensuring the reliability and malleability of these internal representations is imperative for trustworthy real-world deployment. Knowledge editing offers a pivotal paradigm for surgically modifying memory without retraining. However, while recent editors demonstrate high success rates on standard benchmarks, it remains questionable whether current evaluation frameworks that re