Tag: serving
All the articles with the tag "serving".
- 7.0
SAW-INT4: System-Aware 4-Bit KV-Cache Quantization for Real-World LLM Serving
SAW-INT4:考虑真实LLM serving约束(paged memory、正则访存、融合attention)的4-bit KV缓存量化方案
All the articles with the tag "serving".
SAW-INT4:考虑真实LLM serving约束(paged memory、正则访存、融合attention)的4-bit KV缓存量化方案