World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems

发布

2026年04月17日

采集 2026年04月17日 04:31

学术前沿 6.5 分 — Implicit planning for VLA systems, addresses key limitation of direct action prediction in embodied AI

评分 6.5 · 来源：cs.LG updates on arXiv.org · 发布于 2026-04-17

评分依据：Implicit planning for VLA systems, addresses key limitation of direct action prediction in embodied AI

arXiv:2604.14732v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models have emerged as a promising paradigm for building embodied agents that ground perception and language into action. However, most existing approaches rely on direct action prediction, lacking the ability to reason over long-horizon trajectories and evaluate their consequences, which limits performance in complex decision-making tasks. In this work, we introduce World-Value-Action (WAV) model, a unified framework that enables implicit planning in VLA systems.