评分 5.5 · 来源:cs.CL updates on arXiv.org · 发布于 2026-04-17
评分依据:Mechanistic interpretability approach to decoding cognitive constructs from LLMs, bridges cognitive science and MI
arXiv:2604.14593v1 Announce Type: new Abstract: While Large Language Models (LLMs) demonstrate increasingly sophisticated affective capabilities, the internal mechanisms by which they process complex emotions remain unclear. Existing interpretability approaches often treat models as black boxes or focus on coarse-grained basic emotions, leaving the cognitive structure of more complex affective states underexplored. To bridge this gap, we propose a Cognitive Reverse-Engineering framework based on Representation Engineering (RepE) to analyze social-comparison jealousy.