Skip to content
星际流动

Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models

发布
采集
学术前沿 6.5 分 — Comprehensive study of VLM reasoning dynamics across 18 models, tracks confidence over CoT, valuable comparative data
原文: cs.LG updates on arXiv.org

评分 6.5 · 来源:cs.LG updates on arXiv.org · 发布于 2026-04-17

评分依据:Comprehensive study of VLM reasoning dynamics across 18 models, tracks confidence over CoT, valuable comparative data

arXiv:2604.14888v1 Announce Type: cross Abstract: Recent advances in vision language models (VLMs) offer reasoning capabilities, yet how these unfold and integrate visual and textual information remains unclear. We analyze reasoning dynamics in 18 VLMs covering instruction-tuned and reasoning-trained models from two different model families. We track confidence over Chain-of-Thought (CoT), measure the corrective effect of reasoning, and evaluate the contribution of intermediate reasoning steps.