评分依据:Hierarchical DPO for complex reasoning tasks. Addresses granularity limitation of standard DPO.
HiPO: Hierarchical Preference Optimization for Adaptive Reasoning in LLMs
发布
采集
行业动态 6.5 分
— Hierarchical DPO for complex reasoning tasks. Addresses granularity limitation of standard DPO. 原文: arXiv