评分依据:Uses Kalman filtering to improve GRPO advantage estimation. Novel combination of control theory and RL.
Kalman Filter Enhanced GRPO for Reinforcement Learning-Based Language Model Reasoning
发布
采集
行业动态 6.0 分
— Uses Kalman filtering to improve GRPO advantage estimation. Novel combination of control theory and RL. 原文: arXiv