Skip to content
星际流动

MCAP: Deployment-Time Layer Profiling for Memory-Constrained LLM Inference

发布
采集
学术前沿 6.5 分 — Practical deployment tool: per-layer importance estimation for memory-constrained scenarios. Achieves 1.5-1.8x throughput gains.
原文: arxiv.org

评分 6.5 · 来源: · 发布于 2026-04-27

评分依据:Practical deployment tool: per-layer importance estimation for memory-constrained scenarios. Achieves 1.5-1.8x throughput gains.