Skip to content
星际流动

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

发布
采集
学术前沿 6.0 分 — 视频生成作为多模态推理范式
原文: cs.CL updates on arXiv.org

评分 6.0 · 来源:cs.CL updates on arXiv.org · 发布于 2026-04-08

评分依据:视频生成作为多模态推理范式

arXiv:2511.04570v2 Announce Type: replace-cross Abstract: The “Thinking with Text” and “Thinking with Images” paradigms significantly improve the reasoning abilities of large language models (LLMs) and Vision-Language Models (VLMs). However, these paradigms have inherent limitations. (1) Images capture only single moments and fail to represent dynamic processes or continuous changes, and (2) The separation of text and vision as distinct modalities, which hinders unified multimodal understanding and generation. Therefore, we propose “Thinking with Video”, a new paradigm that leverages video gen


标签: