Guiding Distribution Matching Distillation with Gradient-Based Reinforcement Learning

发布

2026年04月22日

采集 2026年04月22日 06:31

学术前沿 6.0 分 — 将RL融入DMD扩散蒸馏并解决样本评分冲突问题，生成效率与质量的权衡优化

评分 6 · 来源：cs.LG updates on arXiv.org · 发布于 2026-04-22

评分依据：将RL融入DMD扩散蒸馏并解决样本评分冲突问题，生成效率与质量的权衡优化

DMD + RL 的冲突

Distribution Matching Distillation (DMD) 在少步生成方面前景广阔，但常牺牲质量换速度。融入RL可以改善，但：

对扩散蒸馏/模型压缩方向有方法论贡献。