评分 3.2 · 来源:cs.CL updates on arXiv.org · 发布于 2026-04-15
评分依据:Moderate AI relevance +novelty(3)
arXiv:2507.22359v4 Announce Type: replace-cross Abstract: Although large language models (LLMs) have shown exceptional capabilities across a wide range of tasks, reliable evaluation remains a critical challenge due to data contamination, opaque operation, and subjective preferences. To address these issues, we propose League of LLMs (LOL), a novel benchmark-free evaluation paradigm that organizes multiple LLMs into a self-governed league for multi-round mutual evaluation. LOL integrates four…