Skip to content
星际流动

DyBBT: Dynamic Balance via Bandit-inspired Targeting for Dialog Policy with Cognitive Dual-Systems

发布
采集
学术前沿 3.7 分 — Moderate AI relevance +novelty(2) +practical(3)
原文: cs.CL updates on arXiv.org

评分 3.7 · 来源:cs.CL updates on arXiv.org · 发布于 2026-04-15

评分依据:Moderate AI relevance +novelty(2) +practical(3)

arXiv:2509.19695v3 Announce Type: replace Abstract: Task oriented dialog systems often rely on static exploration strategies that do not adapt to dynamic dialog contexts, leading to inefficient exploration and suboptimal performance. We propose DyBBT, a novel dialog policy learning framework that formalizes the exploration challenge through a structured cognitive state space capturing dialog progression, user uncertainty, and slot dependency. DyBBT proposes a bandit inspired meta-controller…