Kalman Filter Enhanced GRPO for Reinforcement Learning-Based Language Model Reasoning

发布

2026年04月23日

采集 2026年04月23日 00:00

行业动态 6.0 分 — Uses Kalman filtering to improve GRPO advantage estimation. Novel combination of control theory and RL.

原文： arXiv

评分 6.0 · 来源： · 发布于

评分依据：Uses Kalman filtering to improve GRPO advantage estimation. Novel combination of control theory and RL.

MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design

The Ratchet Effect in Silico through Interaction-Driven Cumulative Intelligence in Large Language Models