Skip to content
星际流动

TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference

发布
采集
行业动态 7.5 分 — Novel KV cache compression approach using temporal tiering. Directly addresses key scalability bottleneck for long-context models.
原文: arXiv

评分 7.5 · 来源: · 发布于

评分依据:Novel KV cache compression approach using temporal tiering. Directly addresses key scalability bottleneck for long-context models.