Skip to content
星际流动

OSC: Hardware Efficient W4A4 Quantization via Outlier Separation in Channel Dimension

发布
采集
学术前沿 3.2 分 — Moderate AI relevance +practical(2)
原文: cs.LG updates on arXiv.org

评分 3.2 · 来源:cs.LG updates on arXiv.org · 发布于 2026-04-15

评分依据:Moderate AI relevance +practical(2)

arXiv:2604.12782v1 Announce Type: new Abstract: While 4-bit quantization is essential for high-throughput deployment of Large Language Models, activation outliers often lead to significant accuracy degradation due to the restricted dynamic range of low-bit formats. In this paper, we systematically investigate the spatial distribution of outliers and demonstrate a token-persistent structural clustering effect, where high-magnitude outliers consistently occupy fixed channels across tokens.…