Faster LLM Inference via Sequential Monte Carlo

发布

2026年04月20日

采集 2026年04月20日 09:04

学术前沿 6.0 分 — SMC-SD: 用序贯蒙特卡洛替代拒绝采样的推测解码加速方法

评分 6 · 来源：cs.LG updates on arXiv.org · 发布于 2026-04-20

评分依据：SMC-SD: 用序贯蒙特卡洛替代拒绝采样的推测解码加速方法

要点

arXiv:2604.15672v1 Announce Type: new Abstract: Speculative decoding (SD) accelerates language model inference by drafting tokens from a cheap proposal model and verifying them against an expensive target model via rejection sampling. Because rejection truncates the draft block at the first error, throughput degrades when draft and target diverge. Rather than rejecting draft tokens outright, we propose to reweight them. To this end, we introduce sequential Monte Carlo speculative decoding (SMC-SD), which replaces token-level rejection with importance-weighted resampling over a population of dr…

🤖 AI 点评

本文提供了AI领域的重要信息，值得行业从业者关注。