评分依据:Practical deployment tool: per-layer importance estimation for memory-constrained scenarios. Achieves 1.5-1.8x throughput gains.
MCAP: Deployment-Time Layer Profiling for Memory-Constrained LLM Inference
发布
采集
学术前沿 6.5 分
— Practical deployment tool: per-layer importance estimation for memory-constrained scenarios. Achieves 1.5-1.8x throughput gains. 原文: arxiv.org