Skip to content
星际流动

Are LLM Uncertainty and Correctness Encoded by the Same Features? A Functional Dissociation via Sparse Autoencoders

发布
采集
行业动态 6.5 分 — Uses SAEs to dissociate uncertainty from correctness features. Novel interpretability approach with implications for reliable AI deployment.
原文: arXiv

评分 6.5 · 来源: · 发布于

评分依据:Uses SAEs to dissociate uncertainty from correctness features. Novel interpretability approach with implications for reliable AI deployment.