GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended Workflows

评分 7 · 来源：cs.CL updates on arXiv.org · 发布于 2026-04-20

评分依据：GTA-2: 从原子工具使用到开放工作流的通用工具Agent基准，覆盖11子域177任务

要点

arXiv:2604.15715v1 Announce Type: new Abstract: The development of general-purpose agents requires a shift from executing simple instructions to completing complex, real-world productivity workflows. However, current tool-use benchmarks remain misaligned with real-world requirements, relying on AI-generated queries, dummy tools, and limited system-level coordination. To address this, we propose GTA-2, a hierarchical benchmark for General Tool Agents (GTA) spanning atomic tool use and open-ended workflows. Built on real-world authenticity, it leverages real user queries, deployed tools, and mul…

🤖 AI 点评

本文提供了AI领域的重要信息，值得行业从业者关注。