评分依据:Proposes extending MoE expert lifespan using options framework. Addresses real MoE memory/churn problem at scale.
Temporally Extended Mixture-of-Experts Models
发布
采集
行业动态 7.0 分
— Proposes extending MoE expert lifespan using options framework. Addresses real MoE memory/churn problem at scale. 原文: arXiv