Tag: interpretability

All the articles with the tag "interpretability".

6.0
PREF-XAI: Preference-Based Personalized Rule Explanations of Black-Box Machine Learning Models
2026年04月22日
· cs.LG updates on arXiv.org· 04/22 14:31 采集
PREF-XAI：基于用户偏好/目标/认知约束的个性化规则解释方法
8.8
Anthropic 解构 LLM 人格空间——「助手轴」研究
2026年03月16日
· Anthropic Research
Anthropic 新研究从神经激活角度定义「助手轴」，揭示 LLM 人格漂移的内在机制，并提出激活限幅方案稳定模型行为。

PREF-XAI: Preference-Based Personalized Rule Explanations of Black-Box Machine Learning Models