LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification

发布

2026年04月08日

采集 2026年04月08日 04:31

学术前沿 5.7 分 — 有一定参考价值的AI研究论文

评分 5.7 · 来源：cs.CL updates on arXiv.org · 发布于 2026-04-08

评分依据：有一定参考价值的AI研究论文

arXiv:2502.17421v3 Announce Type: replace Abstract: As Large Language Models (LLMs) can now process extremely long contexts, efficient inference over these extended inputs has become increasingly important, especially for emerging applications like LLM agents that highly depend on this capability. Speculative decoding (SD) offers a promising lossless acceleration technique compared to lossy alternatives such as quantization and model cascades. However, most state-of-the-art SD methods are trained on short texts (typically fewer than 4k tokens), making them unsuitable for long-context scenarios