评分 6.5 · 来源:cs.LG updates on arXiv.org · 发布于 2026-04-17
评分依据:Practical cost reduction for agent deployment via early termination + model routing, directly relevant to production agents
arXiv:2604.15075v1 Announce Type: cross Abstract: Open-weight Small Language Models(SLMs) can provide faster local inference at lower financial cost, but may not achieve the same performance level as commercial Large Language Models (LLMs) that are orders of magnitudes larger. Consequently, many of the latest applications of LLMs, such as software engineering agents, tend to be evaluated on larger models only, leaving the issue of improving the cost-benefit trade-off of such applications neglected.