Skip to content
星际流动

General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

发布
采集
学术前沿 5.1 分 — 中等质量:常规学术论文,有适度参考价值
原文: cs.AI updates on arXiv.org

评分 5.1 · 来源:cs.AI updates on arXiv.org · 发布于 2026-04-14

评分依据:中等质量:常规学术论文,有适度参考价值

General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

arXiv:2604.11778v1 Announce Type: cross Abstract: Contemporary large language models (LLMs) have demonstrated remarkable reasoning capabilities, particularly in specialized domains like mathematics and physics. However, their ability to generalize these reasoning skills to more general and broader contexts—often termed general reasoning—remains under-explored. Unlike domain-specific reasoning, general reasoning relies less on expert knowledge but still presents formidable reasoning…