Skip to content
星际流动

HiPO: Hierarchical Preference Optimization for Adaptive Reasoning in LLMs

发布
采集
行业动态 6.5 分 — Hierarchical DPO for complex reasoning tasks. Addresses granularity limitation of standard DPO.
原文: arXiv

评分 6.5 · 来源: · 发布于

评分依据:Hierarchical DPO for complex reasoning tasks. Addresses granularity limitation of standard DPO.