General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

发布

2026年04月14日

采集 2026年04月14日 04:31

学术前沿 5.1 分 — 中等质量：常规学术论文，有适度参考价值

评分 5.1 · 来源：cs.AI updates on arXiv.org · 发布于 2026-04-14

评分依据：中等质量：常规学术论文，有适度参考价值

General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

arXiv:2604.11778v1 Announce Type: cross Abstract: Contemporary large language models (LLMs) have demonstrated remarkable reasoning capabilities, particularly in specialized domains like mathematics and physics. However, their ability to generalize these reasoning skills to more general and broader contexts—often termed general reasoning—remains under-explored. Unlike domain-specific reasoning, general reasoning relies less on expert knowledge but still presents formidable reasoning…