PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference

发布

2026年04月29日

采集 2026年04月29日 06:31

工程实践 6.0 分 — 多 agent 共享压缩 KV cache 池，int8/int4 非对称量化，对多 agent 推理部署有实用意义

评分 6 · 来源：arXiv cs.LG · 发布于 2026-04-29

评分依据：多 agent 共享压缩 KV cache 池，int8/int4 非对称量化，对多 agent 推理部署有实用意义

PolyKV 让多个并发推理 agent 共享单一非对称压缩 KV cache 池，而非每个 agent 分配独立 KV cache。压缩是非对称的：Keys 用 int8 量化保持 softmax 稳定性，Values 用更低精度量化。通过 HuggingFace DynamicCache 注入 N 个独立 agent 上下文。

标签：

kv-cache
multi-agent
quantization
inference

From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling

Cornserve: A Distributed Serving System for Any-to-Any Multimodal Models