评分 3.7 · 来源:cs.CL updates on arXiv.org · 发布于 2026-04-15
评分依据:Moderate AI relevance +novelty(1) +practical(3)
arXiv:2601.14004v4 Announce Type: replace Abstract: Mechanistic Interpretability (MI) has emerged as a vital approach to demystify the opaque decision-making of Large Language Models (LLMs). However, existing reviews primarily treat MI as an observational science, summarizing analytical insights while lacking a systematic framework for actionable intervention. To bridge this gap, we present a practical survey structured around the pipeline: “Locate, Steer, and Improve.” We formally categorize…