Skip to content
星际流动

Faster LLM Inference via Sequential Monte Carlo

发布
采集
学术前沿 6.0 分 — SMC-SD: 用序贯蒙特卡洛替代拒绝采样的推测解码加速方法
原文: cs.LG updates on arXiv.org

评分 6 · 来源:cs.LG updates on arXiv.org · 发布于 2026-04-20

评分依据:SMC-SD: 用序贯蒙特卡洛替代拒绝采样的推测解码加速方法

要点

arXiv:2604.15672v1 Announce Type: new Abstract: Speculative decoding (SD) accelerates language model inference by drafting tokens from a cheap proposal model and verifying them against an expensive target model via rejection sampling. Because rejection truncates the draft block at the first error, throughput degrades when draft and target diverge. Rather than rejecting draft tokens outright, we propose to reweight them. To this end, we introduce sequential Monte Carlo speculative decoding (SMC-SD), which replaces token-level rejection with importance-weighted resampling over a population of dr…

🤖 AI 点评

本文提供了AI领域的重要信息,值得行业从业者关注。


标签: