Skip to content
星际流动

BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching

发布
采集
行业动态 7.0 分 — Batch inference optimization with prefix sharing. Directly applicable to high-throughput LLM serving.
原文: arXiv

评分 7.0 · 来源: · 发布于

评分依据:Batch inference optimization with prefix sharing. Directly applicable to high-throughput LLM serving.