评分 8 · 来源:cs.LG updates on arXiv.org · 发布于 2026-04-29
评分依据:Landmark discovery identifying shared attention heads for error-detection and sycophancy across 12 models from 5 labs, major interpretability result
LLMs Know They're Wrong and Agree Anyway: The Shared Sycophancy-Lying Circuit
发布
采集
学术前沿 8.0 分
— Landmark discovery identifying shared attention heads for error-detection and sycophancy across 12 models from 5 labs, major interpretability result