SocialGrid: A Benchmark for Planning and Social Reasoning in Embodied Multi-Agent Systems

发布

2026年04月20日

采集 2026年04月20日 09:04

学术前沿 6.0 分 — SocialGrid: 受Among Us启发的具身多智能体社交推理基准

评分 6 · 来源：cs.LG updates on arXiv.org · 发布于 2026-04-20

评分依据：SocialGrid: 受Among Us启发的具身多智能体社交推理基准

要点

arXiv:2604.16022v1 Announce Type: cross Abstract: As Large Language Models (LLMs) transition from text processors to autonomous agents, evaluating their social reasoning in embodied multi-agent settings becomes critical. We introduce SocialGrid, an embodied multi-agent environment inspired by Among Us that evaluates LLM agents on planning, task execution, and social reasoning. Our evaluations reveal that even the strongest open model (GPT-OSS-120B) achieves below 60% accuracy in task completion and planning, with agents getting stuck in repetitive behaviors or failing to navigate basic obstacl…

🤖 AI 点评

本文提供了AI领域的重要信息，值得行业从业者关注。