Skip to content
星际流动

Kalman Filter Enhanced GRPO for Reinforcement Learning-Based Language Model Reasoning

发布
采集
行业动态 6.0 分 — Uses Kalman filtering to improve GRPO advantage estimation. Novel combination of control theory and RL.
原文: arXiv

评分 6.0 · 来源: · 发布于

评分依据:Uses Kalman filtering to improve GRPO advantage estimation. Novel combination of control theory and RL.