Skip to content
星际流动

ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation

发布
采集
学术前沿 5.7 分 — 中等偏上:有一定信息增量和参考价值
原文: cs.AI updates on arXiv.org

评分 5.7 · 来源:cs.AI updates on arXiv.org · 发布于 2026-04-14

评分依据:中等偏上:有一定信息增量和参考价值

ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation

arXiv:2604.11080v1 Announce Type: cross Abstract: Rotation-based Post-Training Quantization (PTQ) has emerged as a promising solution for mitigating activation outliers in the quantization of Large Language Models (LLMs). Global rotation methods achieve inference efficiency by fusing activation rotations into attention and FFN blocks, but suffer from limited expressivity as they are constrained to use a single learnable rotation matrix across all layers. To tackle this, layer-wise…