评分 5.1 · 来源:cs.AI updates on arXiv.org · 发布于 2026-04-14
评分依据:中等质量:常规学术论文,有适度参考价值
General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks
arXiv:2604.11778v1 Announce Type: cross Abstract: Contemporary large language models (LLMs) have demonstrated remarkable reasoning capabilities, particularly in specialized domains like mathematics and physics. However, their ability to generalize these reasoning skills to more general and broader contexts—often termed general reasoning—remains under-explored. Unlike domain-specific reasoning, general reasoning relies less on expert knowledge but still presents formidable reasoning…