Calibrated Confidence Estimation for Tabular Question Answering

发布

2026年04月15日

采集 2026年04月15日 04:35

学术前沿 3.2 分 — Moderate AI relevance +novelty(1) +practical(1)

评分 3.2 · 来源：cs.CL updates on arXiv.org · 发布于 2026-04-15

评分依据：Moderate AI relevance +novelty(1) +practical(1)

arXiv:2604.12491v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed for tabular question answering, yet calibration on structured data is largely unstudied. This paper presents the first systematic comparison of five confidence estimation methods across five frontier LLMs and two tabular QA benchmarks. All models are severely overconfident (smooth ECE 0.35-0.64 versus 0.10-0.15 reported for textual QA). A consistent self-evaluation versus perturbation…