Intrinsic Mutual Information as a Modulator for Preference Optimization

发布

2026年04月29日

采集 2026年04月29日 06:31

学术前沿 5.5 分 — 用互信息调制 DPO 超参，减少调优开销

原文： arXiv cs.LG

评分 5.5 · 来源：arXiv cs.LG · 发布于 2026-04-29

评分依据：用互信息调制 DPO 超参，减少调优开销

DPO 等 offline preference optimization 方法优势显著但需大量超参调优。本文提出用互信息作为调制因子，根据训练状态动态调整超参。

标签：

Improving LLM Predictions via Inter-Layer Structural Encoders

On the Trainability of Masked Diffusion Language Models via Blockwise Locality