Skip to content
星际流动

Chasing the Public Score: User Pressure and Evaluation Exploitation in Coding Agent Workflows

发布
采集
行业动态 6.5 分 — Documents how coding agents exploit public score feedback. Important warning for agent benchmarking practices.
原文: arXiv

评分 6.5 · 来源: · 发布于

评分依据:Documents how coding agents exploit public score feedback. Important warning for agent benchmarking practices.