Tag: auditing
All the articles with the tag "auditing".
- 7.0
BenchGuard: Who Guards the Benchmarks? Automated Auditing of LLM Agent Benchmarks
首个自动化 LLM Agent 基准审计框架,用前沿 LLM 系统性发现基准自身缺陷
All the articles with the tag "auditing".
首个自动化 LLM Agent 基准审计框架,用前沿 LLM 系统性发现基准自身缺陷