Tag: 强化学习
All the articles with the tag "强化学习".
- 6.7
URSA: The Universal Research and Scientific Agent
arXiv:2506.22653v2 Announce Type: replace Abstract: Large language models (LLMs) have moved far beyond their initial form as simple chatbots, now c...
- 6.4
Framing Effects in Independent-Agent Large Language Models: A Cross-Family Behavioral Analysis
arXiv:2603.19282v2 Announce Type: replace Abstract: In many real-world applications, large language models (LLMs) operate as independent agents wit...
- 6.4
Frame of Reference: Addressing the Challenges of Common Ground Representation in Situational Dialogs
arXiv:2601.09365v2 Announce Type: replace Abstract: Common ground plays a critical role in situated spoken dialogs, where interlocutors must establ...
- 6.4
AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling
arXiv:2603.21357v2 Announce Type: replace-cross Abstract: LLM agents fail on the majority of real-world tasks -- GPT-4o succeeds on fewer than 15% ...
- 6.4
Can We Trust a Black-box LLM? LLM Untrustworthy Boundary Detection via Bias-Diffusion and Multi-Agent Reinforcement Learning
arXiv:2604.05483v1 Announce Type: cross Abstract: Large Language Models (LLMs) have shown a high capability in answering questions on a diverse ran...
- 6.4
From Hallucination to Structure Snowballing: The Alignment Tax of Constrained Decoding in LLM Reflection
arXiv:2604.06066v1 Announce Type: new Abstract: Intrinsic self-correction in Large Language Models (LLMs) frequently fails in open-ended reasoning ...
- 6.4
Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents
arXiv:2604.05808v1 Announce Type: new Abstract: Large language model (LLM) agents have demonstrated strong capabilities in complex interactive deci...
- 6.4
MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning
arXiv:2604.05943v1 Announce Type: new Abstract: Recent advances in multi-agent reinforcement learning (MARL) have demonstrated success in numerous ...
- 6.4
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents
arXiv:2604.06132v1 Announce Type: new Abstract: Large language models are increasingly deployed as autonomous agents executing multi-step workflows...
- 6.3
The Model Agreed, But Didn't Learn: Diagnosing Surface Compliance in Large Language Models
arXiv:2604.05995v1 Announce Type: cross Abstract: Large Language Models (LLMs) internalize vast world knowledge as parametric memory, yet inevitabl...
- 6.0
Mechanistic Circuit-Based Knowledge Editing in Large Language Models
arXiv:2604.05876v1 Announce Type: new Abstract: Deploying Large Language Models (LLMs) in real-world dynamic environments raises the challenge of u...
- 8.3
GrandCode:多智能体 RL 系统在竞赛编程中达到 Grandmaster 级别
GrandCode 通过假设提议、测试生成、方案选择等多个 Agent 模块的协同 RL 训练,在竞赛编程中超越 Gemini 3 Deep Think,首次达到人类 Grandmaster 水平。
- 7.0
多轮强化学习训练工具调用 Agent:MT-GRPO 与迭代奖励校准
首次将 MT-GRPO 与 GTPO 结合用于工具调用 Agent 训练,发现基于规则的密集奖励比 LLM 判断更稳定,提出迭代奖励校准方法。
- 7.0
Reaching Beyond the Mode:强化学习实现语言模型分布推理
用 RL 训练语言模型输出多答案分布而非单一最优答案,解决当前模型在医疗诊断等不确定性场景的局限性。
- 7.7
PrismAudio:518M 参数击败数十亿模型,国产多模态音频生成刷新 SOTA
阿里通义联合港科大发布 PrismAudio,首个将 RL 与 CoT 规划集成到视频配音生成
- 7.4
规模化 RL 代码生成:合成数据与课程学习的深度实践
教师基于学生表现迭代优化问题,无需教师微调即可构建结构化难度递进
- 7.4
Reward Is Enough:LLM 推理时涌现强化学习能力
揭示 LLM 在推理时自然涌现 RL 行为,通过多轮提示即可实现自我改进
- 6.7
3D-Layout-R1:用场景图推理实现结构化空间布局编辑
3D-Layout-R1 利用场景图推理和强化学习进行文本驱动的空间布局编辑,相比 CoT-SFT 方法 IoU 提升 15%,首次将结构化空间推理应用于 3D 场景编辑。
- 6.7
OS-Themis:可扩展的多 Agent 评判框架,GUI Agent RL 训练提升 10.3%
将 GUI Agent 轨迹分解为可验证里程碑,通过多 Agent 审查机制构建高质量奖励函数
- 7.1
MiniMax M2.7 发布:模型参与自身 30-50% 的训练流程
MiniMax 推出自进化 LLM M2.7,模型自主完成训练调试、指标分析等研发环节,MLE Bench Lite 奖牌率 66.6%