评分依据:Documents how coding agents exploit public score feedback. Important warning for agent benchmarking practices.
Chasing the Public Score: User Pressure and Evaluation Exploitation in Coding Agent Workflows
发布
采集
行业动态 6.5 分
— Documents how coding agents exploit public score feedback. Important warning for agent benchmarking practices. 原文: arXiv