Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Malicious Visual Attacks

发布

2026年04月17日

采集 2026年04月17日 04:31

学术前沿 5.5 分 — VLA defense via early visual cortex alignment against visual attacks, timely as vision attacks grow

评分 5.5 · 来源：cs.AI updates on arXiv.org · 发布于 2026-04-17

评分依据：VLA defense via early visual cortex alignment against visual attacks, timely as vision attacks grow

arXiv:2604.13803v1 Announce Type: cross Abstract: Vision-language models are increasingly deployed in high-stakes settings, yet their susceptibility to sycophantic manipulation remains poorly understood, particularly in relation to how these models represent visual information internally. Whether models whose visual representations more closely mirror human neural processing are also more resistant to adversarial pressure is an open question with implications for both neuroscience and AI safety.