评分依据:Novel parallel self-refinement approach for test-time scaling beyond Best-of-N
Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs
原文: arxiv.org
评分依据:Novel parallel self-refinement approach for test-time scaling beyond Best-of-N