FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks

发布

2026年04月17日

采集 2026年04月17日 04:31

学术前沿 5.5 分 — Real-world field work benchmark for agents, moves beyond digital-only agent evaluation

评分 5.5 · 来源：cs.AI updates on arXiv.org · 发布于 2026-04-17

评分依据：Real-world field work benchmark for agents, moves beyond digital-only agent evaluation

arXiv:2505.19662v3 Announce Type: replace Abstract: This paper introduces FieldWorkArena, a benchmark for agentic AI targeting real-world field work. With the recent increase in demand for agentic AI, they are built to detect and document safety hazards, procedural violations, and other critical incidents across real-world manufacturing and retail environments. Whereas most agentic AI benchmarks focus on performance in simulated or digital environments, our work addresses the fundamental challenge of evaluating agents in the real-world.