Skip to content
星际流动

GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended Workflows

发布
采集
学术前沿 7.0 分 — GTA-2: 从原子工具使用到开放工作流的通用工具Agent基准,覆盖11子域177任务
原文: cs.CL updates on arXiv.org

评分 7 · 来源:cs.CL updates on arXiv.org · 发布于 2026-04-20

评分依据:GTA-2: 从原子工具使用到开放工作流的通用工具Agent基准,覆盖11子域177任务

要点

arXiv:2604.15715v1 Announce Type: new Abstract: The development of general-purpose agents requires a shift from executing simple instructions to completing complex, real-world productivity workflows. However, current tool-use benchmarks remain misaligned with real-world requirements, relying on AI-generated queries, dummy tools, and limited system-level coordination. To address this, we propose GTA-2, a hierarchical benchmark for General Tool Agents (GTA) spanning atomic tool use and open-ended workflows. Built on real-world authenticity, it leverages real user queries, deployed tools, and mul…

🤖 AI 点评

本文提供了AI领域的重要信息,值得行业从业者关注。


标签: