Skip to content
星际流动

Prune-Quantize-Distill: An Ordered Pipeline for Efficient Neural Network Compression

发布
采集
学术前沿 5.6 分 — 有一定参考价值的AI研究论文
原文: cs.AI updates on arXiv.org

评分 5.6 · 来源:cs.AI updates on arXiv.org · 发布于 2026-04-08

评分依据:有一定参考价值的AI研究论文

arXiv:2604.04988v1 Announce Type: cross Abstract: Modern deployment often requires trading accuracy for efficiency under tight CPU and memory constraints, yet common compression proxies such as parameter count or FLOPs do not reliably predict wall-clock inference time. In particular, unstructured sparsity can reduce model storage while failing to accelerate (and sometimes slightly slowing down) standard CPU execution due to irregular memory access and sparse kernel overhead. Motivated by this gap between compression and acceleration, we study a practical, ordered pipeline that targets measured


标签: