Skip to content
星际流动

CLAY: Conditional Visual Similarity Modulation in Vision-Language Embedding Space

发布
采集
学术前沿 5.5 分 — 中等偏上:有一定信息增量和参考价值
原文: cs.AI updates on arXiv.org

评分 5.5 · 来源:cs.AI updates on arXiv.org · 发布于 2026-04-14

评分依据:中等偏上:有一定信息增量和参考价值

CLAY: Conditional Visual Similarity Modulation in Vision-Language Embedding Space

arXiv:2604.11539v1 Announce Type: cross Abstract: Human perception of visual similarity is inherently adaptive and subjective, depending on the users’ interests and focus. However, most image retrieval systems fail to reflect this flexibility, relying on a fixed, monolithic metric that cannot incorporate multiple conditions simultaneously. To address this, we propose CLAY, an adaptive similarity computation method that reframes the embedding space of pretrained Vision-Language Models (VLMs) as…