评分 6.5 · 来源:cs.LG updates on arXiv.org · 发布于 2026-04-17
评分依据:Exploits high-magnitude Q/K activations in long context for RL guidance, novel intrinsic signal for long-context training
arXiv:2604.14922v1 Announce Type: new Abstract: Reinforcement Learning (RL) has emerged as a critical driver for enhancing the reasoning capabilities of Large Language Models (LLMs). While recent advancements have focused on reward engineering or data synthesis, few studies exploit the model’s intrinsic representation characteristics to guide the training process. In this paper, we first observe the presence of high-magnitude activations within the query and key vectors when processing long contexts.