DySCO: Dynamic Attention-Scaling Decoding for Long-Context Language Models

发布

2026年04月17日

采集 2026年04月17日 04:31

学术前沿 5.5 分 — Decoding algorithm for long-context reasoning, dynamic attention scaling is a sensible approach to accuracy-length tradeoff

评分 5.5 · 来源：cs.CL updates on arXiv.org · 发布于 2026-04-17

评分依据：Decoding algorithm for long-context reasoning, dynamic attention scaling is a sensible approach to accuracy-length tradeoff

arXiv:2602.22175v2 Announce Type: replace Abstract: Understanding and reasoning over long contexts is a crucial capability for language models (LMs). Although recent models support increasingly long context windows, their accuracy often deteriorates as input length grows. In practice, models often struggle to keep attention aligned with the most relevant context throughout decoding. In this work, we propose DYSCO, a novel decoding algorithm for improving long-context reasoning.