Continuous Semantic Caching for Low-Cost LLM Serving

发布

2026年04月23日

采集 2026年04月23日 00:00

行业动态 7.0 分 — Practical caching framework for LLM serving cost reduction. Directly applicable to production inference systems.

原文： arXiv

评分 7.0 · 来源： · 发布于

评分依据：Practical caching framework for LLM serving cost reduction. Directly applicable to production inference systems.

Are LLM Uncertainty and Correctness Encoded by the Same Features? A Functional Dissociation via Sparse Autoencoders

On the Quantization Robustness of Diffusion Language Models in Coding Benchmarks