评分 7 · 来源:cs.CL updates on arXiv.org · 发布于 2026-04-20
评分依据:GTA-2: 从原子工具使用到开放工作流的通用工具Agent基准,覆盖11子域177任务
要点
arXiv:2604.15715v1 Announce Type: new Abstract: The development of general-purpose agents requires a shift from executing simple instructions to completing complex, real-world productivity workflows. However, current tool-use benchmarks remain misaligned with real-world requirements, relying on AI-generated queries, dummy tools, and limited system-level coordination. To address this, we propose GTA-2, a hierarchical benchmark for General Tool Agents (GTA) spanning atomic tool use and open-ended workflows. Built on real-world authenticity, it leverages real user queries, deployed tools, and mul…
🤖 AI 点评
本文提供了AI领域的重要信息,值得行业从业者关注。