Tag: 推理
All the articles with the tag "推理".
- 7.3
The illusion of reasoning: step-level evaluation reveals decorative chain-of-thought in frontier language models
arXiv:2603.22816v2 Announce Type: replace Abstract: Language models increasingly 「show their work「 by writing step-by-step reasoning before answeri...
- 6.7
URSA: The Universal Research and Scientific Agent
arXiv:2506.22653v2 Announce Type: replace Abstract: Large language models (LLMs) have moved far beyond their initial form as simple chatbots, now c...
- 6.7
Stop Fixating on Prompts: Reasoning Hijacking and Constraint Tightening for Red-Teaming LLM Agents
arXiv:2604.05549v1 Announce Type: new Abstract: With the widespread application of LLM-based agents across various domains, their complexity has in...
- 6.4
TS-Agent: Understanding and Reasoning Over Raw Time Series via Iterative Insight Gathering
arXiv:2510.07432v2 Announce Type: replace Abstract: Large language models (LLMs) exhibit strong symbolic and compositional reasoning, yet they stru...
- 6.4
LUDOBENCH: Evaluating LLM Behavioural Decision-Making Through Spot-Based Board Game Scenarios in Ludo
arXiv:2604.05681v1 Announce Type: cross Abstract: We introduce LudoBench, a benchmark for evaluating LLM strategic reasoning in Ludo, a stochastic ...
- 6.4
IBISAgent: Reinforcing Pixel-Level Visual Reasoning in MLLMs for Universal Biomedical Object Referring and Segmentation
arXiv:2601.03054v4 Announce Type: replace-cross Abstract: Recent research on medical MLLMs has gradually shifted its focus from image-level underst...
- 6.4
From Hallucination to Structure Snowballing: The Alignment Tax of Constrained Decoding in LLM Reflection
arXiv:2604.06066v1 Announce Type: new Abstract: Intrinsic self-correction in Large Language Models (LLMs) frequently fails in open-ended reasoning ...
- 6.0
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
arXiv:2511.04570v2 Announce Type: replace-cross Abstract: The 「Thinking with Text「 and 「Thinking with Images「 paradigms significantly improve the r...
- 6.0
Learning to Edit Knowledge via Instruction-based Chain-of-Thought Prompting
arXiv:2604.05540v1 Announce Type: new Abstract: Large language models (LLMs) can effectively handle outdated information through knowledge editing....
- 6.0
Automatic Replication of LLM Mistakes in Medical Conversations
arXiv:2512.20983v2 Announce Type: replace Abstract: Large language models (LLMs) are increasingly evaluated in clinical settings using multi-dimens...
- 6.0
Mechanistic Circuit-Based Knowledge Editing in Large Language Models
arXiv:2604.05876v1 Announce Type: new Abstract: Deploying Large Language Models (LLMs) in real-world dynamic environments raises the challenge of u...
- 8.3
单智能体在多跳推理中优于多智能体系统:固定 Token 预算下的信息论论证
在同等思考 Token 预算下,单智能体系统在多跳推理任务中信息效率更高,多智能体系统的性能增益主要来自更多计算而非架构优势。
- 6.5
Google Gemini API 推出 Flex 和 Priority 推理层级,平衡成本与可靠性
Google 为 Gemini API 引入 Flex 和 Priority 两种推理模式。
- 6.8
字节前高管创办蓝芯算力:RISC-V AI推理芯片获数亿融资
前字节跳动芯片负责人卢山创办蓝芯算力,专注RISC-V架构AI推理芯片,已获联想、腾讯云等20万片订单。
- 8.0
林俊旸离职首曝:Qwen 推理链方向存在致命技术误区
前阿里 Qwen 团队成员林俊旸离职后首次公开反思,指出堆叠推理链是错误方向,揭示大模型训练中的关键技术决策失误。
- 7.4
GTC 巅峰对话 Jeff Dean x Bill Dally:预训练范式已死,下一前沿在推理与系统
Google Jeff Dean 与 NVIDIA Bill Dally 在 GTC 2026 对话,指出 AI 发展重心正从预训练转向推理优化和系统工程。
- 7.3
- 9.0
Mistral Small 4 发布:119B MoE 统一推理、多模态和编程能力,Apache 2 开源
Mistral 发布 Small 4 模型,119B 参数(6B 激活)MoE 架构,首次统一 Magistral 推理、Pixtral 多模态和 Devstral 编程能力,Apache 2 许可
- 7.5