Mechanistic Decoding of Cognitive Constructs in LLMs

发布

2026年04月17日

采集 2026年04月17日 04:31

学术前沿 5.5 分 — Mechanistic interpretability approach to decoding cognitive constructs from LLMs, bridges cognitive science and MI

评分 5.5 · 来源：cs.CL updates on arXiv.org · 发布于 2026-04-17

评分依据：Mechanistic interpretability approach to decoding cognitive constructs from LLMs, bridges cognitive science and MI

arXiv:2604.14593v1 Announce Type: new Abstract: While Large Language Models (LLMs) demonstrate increasingly sophisticated affective capabilities, the internal mechanisms by which they process complex emotions remain unclear. Existing interpretability approaches often treat models as black boxes or focus on coarse-grained basic emotions, leaving the cognitive structure of more complex affective states underexplored. To bridge this gap, we propose a Cognitive Reverse-Engineering framework based on Representation Engineering (RepE) to analyze social-comparison jealousy.