EpiBench: Benchmarking Multi-turn Research Workflows for Multimodal Agents

发布

2026年04月08日

采集 2026年04月08日 04:31

学术前沿 6.4 分 — 有一定参考价值的AI研究论文

评分 6.4 · 来源：cs.CL updates on arXiv.org · 发布于 2026-04-08

评分依据：有一定参考价值的AI研究论文

arXiv:2604.05557v1 Announce Type: new Abstract: Scientific research follows multi-turn, multi-step workflows that require proactively searching the literature, consulting figures and tables, and integrating evidence across papers to align experimental settings and support reproducible conclusions. This joint capability is not systematically assessed in existing benchmarks, which largely under-evaluate proactive search, multi-evidence integration and sustained evidence use over time. In this work, we introduce EpiBench, an episodic multi-turn multimodal benchmark that instantiates short researc