评分依据:Batch inference optimization with prefix sharing. Directly applicable to high-throughput LLM serving.
BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching
发布
采集
行业动态 7.0 分
— Batch inference optimization with prefix sharing. Directly applicable to high-throughput LLM serving. 原文: arXiv