Skip to content
星际流动

MM-LIMA: Less Is More for Alignment in Multi-Modal Datasets

发布
采集
学术前沿 5.3 分 — 中等质量:常规学术论文,有适度参考价值
原文: cs.AI updates on arXiv.org

评分 5.3 · 来源:cs.AI updates on arXiv.org · 发布于 2026-04-14

评分依据:中等质量:常规学术论文,有适度参考价值

MM-LIMA: Less Is More for Alignment in Multi-Modal Datasets

arXiv:2308.12067v3 Announce Type: replace-cross Abstract: Multimodal large language models are typically trained in two stages: first pre-training on image-text pairs, and then fine-tuning using supervised vision-language instruction data. Recent studies have shown that large language models can achieve satisfactory results even with a limited amount of high-quality instruction-following data. In this paper, we introduce MM-LIMA, which is fine-tuned on a small dataset comprising only 200…