You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass

发布

2026年04月14日

采集 2026年04月14日 04:31

学术前沿 6.0 分 — Practical efficiency gain for reward models - N-way scoring in single forward pass. Useful infrastructure improvement for RLHF pipelines.

原文： cs.AI updates on arXiv.org

评分 6 · 来源：cs.AI updates on arXiv.org · 发布于 2026-04-14

评分依据：Practical efficiency gain for reward models - N-way scoring in single forward pass. Useful infrastructure improvement for RLHF pipelines.

Mem²Evolve: Towards Self-Evolving Agents via Co-Evolutionary Capability Expansion and Experience Distillation

EmbodiedGovBench: A Benchmark for Governance, Recovery, and Upgrade Safety in Embodied Agent Systems