Prune-Quantize-Distill: An Ordered Pipeline for Efficient Neural Network Compression

发布

2026年04月08日

采集 2026年04月08日 04:31

学术前沿 5.6 分 — 有一定参考价值的AI研究论文

评分 5.6 · 来源：cs.AI updates on arXiv.org · 发布于 2026-04-08

评分依据：有一定参考价值的AI研究论文

arXiv:2604.04988v1 Announce Type: cross Abstract: Modern deployment often requires trading accuracy for efficiency under tight CPU and memory constraints, yet common compression proxies such as parameter count or FLOPs do not reliably predict wall-clock inference time. In particular, unstructured sparsity can reduce model storage while failing to accelerate (and sometimes slightly slowing down) standard CPU execution due to irregular memory access and sparse kernel overhead. Motivated by this gap between compression and acceleration, we study a practical, ordered pipeline that targets measured