评分依据:Practical caching framework for LLM serving cost reduction. Directly applicable to production inference systems.
Continuous Semantic Caching for Low-Cost LLM Serving
发布
采集
行业动态 7.0 分
— Practical caching framework for LLM serving cost reduction. Directly applicable to production inference systems. 原文: arXiv