评分 5.5 · 来源:cs.LG updates on arXiv.org · 发布于 2026-04-17
评分依据:Novel attention mechanism addressing sink problem, could impact efficient inference design
arXiv:2601.12145v2 Announce Type: replace Abstract: Softmax attention struggles with long contexts due to structural limitations: the strict sum-to-one constraint forces attention sinks on irrelevant tokens, and probability mass disperses as sequence lengths increase.