Mamba-3 introduces improved sequence modeling using state space principles. The model achieves comparable perplexity to Mamba-2 despite using half of its predecessor’s state size, advancing the performance-efficiency Pareto frontier.
Mamba-3: Improved Sequence Modeling using State Space Principles
发布
7.6 分
— 重要架构改进,sub-quadratic 模型新进展,对研究者有价值