评分依据:Novel KV cache compression approach using temporal tiering. Directly addresses key scalability bottleneck for long-context models.
TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference
发布
采集
行业动态 7.5 分
— Novel KV cache compression approach using temporal tiering. Directly addresses key scalability bottleneck for long-context models. 原文: arXiv