评分依据:RL framework for function calling with reasoning-aware rewards. Practical for tool-use agent development.
R2IF: Aligning Reasoning with Decisions via Composite Rewards for Interpretable LLM Function Calling
发布
采集
行业动态 6.0 分
— RL framework for function calling with reasoning-aware rewards. Practical for tool-use agent development. 原文: arXiv