Skip to content
星际流动

Threshold Differential Attention for Sink-Free, Ultra-Sparse, and Non-Dispersive Language Modeling

发布
采集
学术前沿 5.5 分 — Novel attention mechanism addressing sink problem, could impact efficient inference design
原文: cs.LG updates on arXiv.org

评分 5.5 · 来源:cs.LG updates on arXiv.org · 发布于 2026-04-17

评分依据:Novel attention mechanism addressing sink problem, could impact efficient inference design

arXiv:2601.12145v2 Announce Type: replace Abstract: Softmax attention struggles with long contexts due to structural limitations: the strict sum-to-one constraint forces attention sinks on irrelevant tokens, and probability mass disperses as sequence lengths increase.