League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models

发布

2026年04月15日

采集 2026年04月15日 04:35

学术前沿 3.2 分 — Moderate AI relevance +novelty(3)

评分 3.2 · 来源：cs.CL updates on arXiv.org · 发布于 2026-04-15

评分依据：Moderate AI relevance +novelty(3)

arXiv:2507.22359v4 Announce Type: replace-cross Abstract: Although large language models (LLMs) have shown exceptional capabilities across a wide range of tasks, reliable evaluation remains a critical challenge due to data contamination, opaque operation, and subjective preferences. To address these issues, we propose League of LLMs (LOL), a novel benchmark-free evaluation paradigm that organizes multiple LLMs into a self-governed league for multi-round mutual evaluation. LOL integrates four…