METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models

发布

2026年04月14日

采集 2026年04月14日 04:31

学术前沿 5.2 分 — 中等质量：常规学术论文，有适度参考价值

评分 5.2 · 来源：cs.AI updates on arXiv.org · 发布于 2026-04-14

评分依据：中等质量：常规学术论文，有适度参考价值

METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models

arXiv:2604.11502v1 Announce Type: cross Abstract: Contextual causal reasoning is a critical yet challenging capability for Large Language Models (LLMs). Existing benchmarks, however, often evaluate this skill in fragmented settings, failing to ensure context consistency or cover the full causal hierarchy. To address this, we pioneer METER to systematically benchmark LLMs across all three levels of the causal ladder under a unified context setting. Our extensive evaluation of various LLMs…