A Mechanistic Account of Attention Sinks in GPT-2: One Circuit, Broader Implications for Mitigation

发布

2026年04月17日

采集 2026年04月17日 04:31

学术前沿 6.0 分 — Solid mechanistic interpretability work identifying sink circuit components, causal interventions validate findings

评分 6 · 来源：cs.LG updates on arXiv.org · 发布于 2026-04-17

评分依据：Solid mechanistic interpretability work identifying sink circuit components, causal interventions validate findings

arXiv:2604.14722v1 Announce Type: new Abstract: Transformers commonly exhibit an attention sink: disproportionately high attention to the first position. We study this behavior in GPT-2-style models with learned query biases and absolute positional embeddings. Combining structural analysis with causal interventions, validated across natural-language, mathematical, and code inputs, we find that the sink arises from the interaction among (i) a learned query bias, (ii) the first-layer MLP transformation of the positional encoding, and (iii) structure in the key projection.