评分 5.5 · 来源:cs.LG updates on arXiv.org · 发布于 2026-04-17
评分依据:Data recipe optimization for multimodal midtraining, inspectable/adaptable recipes are practical for practitioners
arXiv:2604.14198v1 Announce Type: new Abstract: Domain reweighting can improve sample efficiency and downstream generalization, but data-mixture optimization for multimodal midtraining remains largely unexplored. Current multimodal training recipes tune mixtures along a single dimension, typically data format or task type. We introduce MixAtlas, a method that produces benchmark-targeted data recipes that can be inspected, adapted, and transferred to new corpora.