评分依据:Uses SAEs to dissociate uncertainty from correctness features. Novel interpretability approach with implications for reliable AI deployment.
Are LLM Uncertainty and Correctness Encoded by the Same Features? A Functional Dissociation via Sparse Autoencoders
发布
采集
行业动态 6.5 分
— Uses SAEs to dissociate uncertainty from correctness features. Novel interpretability approach with implications for reliable AI deployment. 原文: arXiv