Tag: efficient-LLM
All the articles with the tag "efficient-LLM".
- 6.0
Efficient Mixture-of-Experts LLM Inference with Apple Silicon NPUs
解决MoE LLM在Apple Neural Engine上推理的三大核心挑战:动态张量形状、不规则算子和显存碎片
All the articles with the tag "efficient-LLM".
解决MoE LLM在Apple Neural Engine上推理的三大核心挑战:动态张量形状、不规则算子和显存碎片