Tag: deployment
All the articles with the tag "deployment".
- 7.0
Beyond Linear Probes: Dynamic Safety Monitoring for Language Models
动态安全监控:根据输入难度灵活调整计算成本的LLM激活监测方法
- 7.0
SAW-INT4: System-Aware 4-Bit KV-Cache Quantization for Real-World LLM Serving
SAW-INT4:考虑真实LLM serving约束(paged memory、正则访存、融合attention)的4-bit KV缓存量化方案