评分依据:Revisits OPD formulation with theoretical analysis and empirical fixes for post-training
Efficient Rationale-based Retrieval: On-policy Distillation from Generative Rerankers based on JEPA
发布
采集
学术前沿 7.0 分
— Revisits OPD formulation with theoretical analysis and empirical fixes for post-training 原文: arxiv.org