Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind

发布

2026年04月14日

采集 2026年04月14日 04:31

学术前沿 5.0 分 — 中等质量：常规学术论文，有适度参考价值

评分 5.0 · 来源：cs.AI updates on arXiv.org · 发布于 2026-04-14

评分依据：中等质量：常规学术论文，有适度参考价值

Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind

arXiv:2604.11666v1 Announce Type: cross Abstract: As large language models (LLMs) become the engine behind conversational systems, their ability to reason about the intentions and states of their dialogue partners (i.e., form and use a theory-of-mind, or ToM) becomes increasingly critical for safe interaction with potentially adversarial partners. We propose a novel privacy-themed ToM challenge, ToM for Steering Beliefs (ToM-SB), in which a defender must act as a Double Agent to steer the…