Skip to content
星际流动

DySCO: Dynamic Attention-Scaling Decoding for Long-Context Language Models

发布
采集
学术前沿 5.5 分 — Decoding algorithm for long-context reasoning, dynamic attention scaling is a sensible approach to accuracy-length tradeoff
原文: cs.CL updates on arXiv.org

评分 5.5 · 来源:cs.CL updates on arXiv.org · 发布于 2026-04-17

评分依据:Decoding algorithm for long-context reasoning, dynamic attention scaling is a sensible approach to accuracy-length tradeoff

arXiv:2602.22175v2 Announce Type: replace Abstract: Understanding and reasoning over long contexts is a crucial capability for language models (LMs). Although recent models support increasingly long context windows, their accuracy often deteriorates as input length grows. In practice, models often struggle to keep attention aligned with the most relevant context throughout decoding. In this work, we propose DYSCO, a novel decoding algorithm for improving long-context reasoning.