Skip to content
星际流动

Can We Trust a Black-box LLM? LLM Untrustworthy Boundary Detection via Bias-Diffusion and Multi-Agent Reinforcement Learning

发布
采集
学术前沿 6.4 分 — 有一定参考价值的AI研究论文
原文: cs.CL updates on arXiv.org

评分 6.4 · 来源:cs.CL updates on arXiv.org · 发布于 2026-04-08

评分依据:有一定参考价值的AI研究论文

arXiv:2604.05483v1 Announce Type: cross Abstract: Large Language Models (LLMs) have shown a high capability in answering questions on a diverse range of topics. However, these models sometimes produce biased, ideologized or incorrect responses, limiting their applications if there is no clear understanding of which topics their answers can be trusted. In this research, we introduce a novel algorithm, named as GMRL-BD, designed to identify the untrustworthy boundaries (in terms of topics) of a given LLM, with black-box access to the LLM and under specific query constraints. Based on a general K


标签: