Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

发布

2026年04月15日

采集 2026年04月15日 04:35

学术前沿 3.7 分 — Moderate AI relevance +novelty(1) +practical(3)

评分 3.7 · 来源：cs.CL updates on arXiv.org · 发布于 2026-04-15

评分依据：Moderate AI relevance +novelty(1) +practical(3)

arXiv:2601.14004v4 Announce Type: replace Abstract: Mechanistic Interpretability (MI) has emerged as a vital approach to demystify the opaque decision-making of Large Language Models (LLMs). However, existing reviews primarily treat MI as an observational science, summarizing analytical insights while lacking a systematic framework for actionable intervention. To bridge this gap, we present a practical survey structured around the pipeline: “Locate, Steer, and Improve.” We formally categorize…