Skip to content
星际流动

League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models

发布
采集
学术前沿 3.2 分 — Moderate AI relevance +novelty(3)
原文: cs.CL updates on arXiv.org

评分 3.2 · 来源:cs.CL updates on arXiv.org · 发布于 2026-04-15

评分依据:Moderate AI relevance +novelty(3)

arXiv:2507.22359v4 Announce Type: replace-cross Abstract: Although large language models (LLMs) have shown exceptional capabilities across a wide range of tasks, reliable evaluation remains a critical challenge due to data contamination, opaque operation, and subjective preferences. To address these issues, we propose League of LLMs (LOL), a novel benchmark-free evaluation paradigm that organizes multiple LLMs into a self-governed league for multi-round mutual evaluation. LOL integrates four…