评分 6 · 来源:cs.CL updates on arXiv.org · 发布于 2026-04-17
评分依据:Probes internal knowledge vs external expression boundary in LLMs, important for understanding knowledge representation
arXiv:2604.14180v1 Announce Type: new Abstract: We train a 318M-parameter Transformer language model from scratch on a curated corpus of 1.56 billion tokens of pure Classical Chinese, with zero English characters or Arabic numerals. Through systematic out-of-distribution (OOD) testing, we investigate whether the model can distinguish known from unknown inputs, and crucially, whether it can express this distinction in its generated text. We find a clear dissociation between internal and external uncertainty.