评分 6 · 来源:cs.LG updates on arXiv.org · 发布于 2026-04-20
评分依据:SocialGrid: 受Among Us启发的具身多智能体社交推理基准
要点
arXiv:2604.16022v1 Announce Type: cross Abstract: As Large Language Models (LLMs) transition from text processors to autonomous agents, evaluating their social reasoning in embodied multi-agent settings becomes critical. We introduce SocialGrid, an embodied multi-agent environment inspired by Among Us that evaluates LLM agents on planning, task execution, and social reasoning. Our evaluations reveal that even the strongest open model (GPT-OSS-120B) achieves below 60% accuracy in task completion and planning, with agents getting stuck in repetitive behaviors or failing to navigate basic obstacl…
🤖 AI 点评
本文提供了AI领域的重要信息,值得行业从业者关注。