Skip to content
星际流动

MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining

发布
采集
学术前沿 5.5 分 — Data recipe optimization for multimodal midtraining, inspectable/adaptable recipes are practical for practitioners
原文: cs.LG updates on arXiv.org

评分 5.5 · 来源:cs.LG updates on arXiv.org · 发布于 2026-04-17

评分依据:Data recipe optimization for multimodal midtraining, inspectable/adaptable recipes are practical for practitioners

arXiv:2604.14198v1 Announce Type: new Abstract: Domain reweighting can improve sample efficiency and downstream generalization, but data-mixture optimization for multimodal midtraining remains largely unexplored. Current multimodal training recipes tune mixtures along a single dimension, typically data format or task type. We introduce MixAtlas, a method that produces benchmark-targeted data recipes that can be inspected, adapted, and transferred to new corpora.