Skip to content
星际流动

Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Malicious Visual Attacks

发布
采集
学术前沿 5.5 分 — VLA defense via early visual cortex alignment against visual attacks, timely as vision attacks grow
原文: cs.AI updates on arXiv.org

评分 5.5 · 来源:cs.AI updates on arXiv.org · 发布于 2026-04-17

评分依据:VLA defense via early visual cortex alignment against visual attacks, timely as vision attacks grow

arXiv:2604.13803v1 Announce Type: cross Abstract: Vision-language models are increasingly deployed in high-stakes settings, yet their susceptibility to sycophantic manipulation remains poorly understood, particularly in relation to how these models represent visual information internally. Whether models whose visual representations more closely mirror human neural processing are also more resistant to adversarial pressure is an open question with implications for both neuroscience and AI safety.