评分 5.5 · 来源:cs.AI updates on arXiv.org · 发布于 2026-04-17
评分依据:VLA defense via early visual cortex alignment against visual attacks, timely as vision attacks grow
arXiv:2604.13803v1 Announce Type: cross Abstract: Vision-language models are increasingly deployed in high-stakes settings, yet their susceptibility to sycophantic manipulation remains poorly understood, particularly in relation to how these models represent visual information internally. Whether models whose visual representations more closely mirror human neural processing are also more resistant to adversarial pressure is an open question with implications for both neuroscience and AI safety.