Skip to content
星际流动

Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind

发布
采集
学术前沿 5.0 分 — 中等质量:常规学术论文,有适度参考价值
原文: cs.AI updates on arXiv.org

评分 5.0 · 来源:cs.AI updates on arXiv.org · 发布于 2026-04-14

评分依据:中等质量:常规学术论文,有适度参考价值

Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind

arXiv:2604.11666v1 Announce Type: cross Abstract: As large language models (LLMs) become the engine behind conversational systems, their ability to reason about the intentions and states of their dialogue partners (i.e., form and use a theory-of-mind, or ToM) becomes increasingly critical for safe interaction with potentially adversarial partners. We propose a novel privacy-themed ToM challenge, ToM for Steering Beliefs (ToM-SB), in which a defender must act as a Double Agent to steer the…