评分 6.4 · 来源:cs.AI updates on arXiv.org · 发布于 2026-04-08
评分依据:有一定参考价值的AI研究论文
arXiv:2603.09643v4 Announce Type: replace-cross Abstract: Current evaluation frameworks and benchmarks for LLM powered agents focus on text chat driven agents, these frameworks do not expose the persona of user to the agent, thus operating in a user agnostic environment. Importantly, in customer experience management domain, the agent’s behaviour evolves as the agent learns about user personality. With proliferation of real time TTS and multi-modal language models, LLM based agents are gradually going to become multi-modal. Towards this, we propose the MM-tau-p$^2$ benchmark with metrics for e