LLMs Know They're Wrong and Agree Anyway: The Shared Sycophancy-Lying Circuit

发布

2026年04月29日

采集 2026年04月30日 06:33

学术前沿 8.0 分 — Landmark discovery identifying shared attention heads for error-detection and sycophancy across 12 models from 5 labs, major interpretability result

原文： cs.LG updates on arXiv.org

评分 8 · 来源：cs.LG updates on arXiv.org · 发布于 2026-04-29

评分依据：Landmark discovery identifying shared attention heads for error-detection and sycophancy across 12 models from 5 labs, major interpretability result

JumpLoRA: Sparse Adapters for Continual Learning in LLMs

Synthetic Eggs in Many Baskets: Impact of Synthetic Data Diversity on LLM Fine-Tuning