Skip to content
星际流动

A Mechanistic Account of Attention Sinks in GPT-2: One Circuit, Broader Implications for Mitigation

发布
采集
学术前沿 6.0 分 — Solid mechanistic interpretability work identifying sink circuit components, causal interventions validate findings
原文: cs.LG updates on arXiv.org

评分 6 · 来源:cs.LG updates on arXiv.org · 发布于 2026-04-17

评分依据:Solid mechanistic interpretability work identifying sink circuit components, causal interventions validate findings

arXiv:2604.14722v1 Announce Type: new Abstract: Transformers commonly exhibit an attention sink: disproportionately high attention to the first position. We study this behavior in GPT-2-style models with learned query biases and absolute positional embeddings. Combining structural analysis with causal interventions, validated across natural-language, mathematical, and code inputs, we find that the sink arises from the interaction among (i) a learned query bias, (ii) the first-layer MLP transformation of the positional encoding, and (iii) structure in the key projection.