Skip to content
星际流动

GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding Models

发布
采集
学术前沿 6.0 分 — Reveals GUI grounding models drop 27-56pp under spatial reasoning instructions, exposes benchmark blind spot
原文: cs.LG updates on arXiv.org

评分 6 · 来源:cs.LG updates on arXiv.org · 发布于 2026-04-17

评分依据:Reveals GUI grounding models drop 27-56pp under spatial reasoning instructions, exposes benchmark blind spot

arXiv:2604.14262v1 Announce Type: new Abstract: GUI grounding models report over 85% accuracy on standard benchmarks, yet drop 27-56 percentage points when instructions require spatial reasoning rather than direct element naming. Current benchmarks miss this because they evaluate each screenshot once with a single fixed instruction. We introduce GUI-Perturbed, a controlled perturbation framework that independently varies visual scenes and instructions to measure grounding robustness.