评分 6.5 · 来源:cs.CL updates on arXiv.org · 发布于 2026-04-17
评分依据:Important diagnostic work showing prompt optimization unpredictability in compound AI systems, has practical implications for dev workflows
arXiv:2604.14585v1 Announce Type: cross Abstract: Prompt optimization in compound AI systems is statistically indistinguishable from a coin flip: across 72 optimization runs on Claude Haiku (6 methods $\times$ 4 tasks $\times$ 3 repeats), 49% score below zero-shot; on Amazon Nova Lite, the failure rate is even higher. Yet on one task, all six methods improve over zero-shot by up to $+6.8$ points.