评分 6.5 · 来源:cs.LG updates on arXiv.org · 发布于 2026-04-17
评分依据:Implicit planning for VLA systems, addresses key limitation of direct action prediction in embodied AI
arXiv:2604.14732v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models have emerged as a promising paradigm for building embodied agents that ground perception and language into action. However, most existing approaches rely on direct action prediction, lacking the ability to reason over long-horizon trajectories and evaluate their consequences, which limits performance in complex decision-making tasks. In this work, we introduce World-Value-Action (WAV) model, a unified framework that enables implicit planning in VLA systems.