Posts
All the articles I've posted.
- 6.3
Learning Uncertainty from Sequential Internal Dispersion in Large Language Models
arXiv:2604.15741v1 Announce Type: new Abstract: Uncertainty estimation is a promising approach to detect hallucinations in large language models (LLMs). Recent approaches commonly depend on model inte
- 6.3
Preference Estimation via Opponent Modeling in Multi-Agent Negotiation
arXiv:2604.15687v1 Announce Type: new Abstract: Automated negotiation in complex, multi-party and multi-issue settings critically depends on accurate opponent modeling. However, conventional numerical
- 6.3
AIFIND: Artifact-Aware Interpreting Fine-Grained Alignment for Incremental Face Forgery Detection
arXiv:2604.16207v1 Announce Type: cross Abstract: As forgery types continue to emerge consistently, Incremental Face Forgery Detection (IFFD) has become a crucial paradigm. However, existing methods t
- 6.3
ChemGraph-XANES: An Agentic Framework for XANES Simulation and Analysis
arXiv:2604.16205v1 Announce Type: cross Abstract: Computational X-ray absorption near-edge structure (XANES) is widely used to probe local coordination environments, oxidation states, and electronic s
- 6.3
HyperGVL: Benchmarking and Improving Large Vision-Language Models in Hypergraph Understanding and Reasoning
arXiv:2604.15648v1 Announce Type: new Abstract: Large Vision-Language Models (LVLMs) consistently require new arenas to guide their expanding boundaries, yet their capabilities with hypergraphs remain
- 6.3
DALM: A Domain-Algebraic Language Model via Three-Phase Structured Generation
arXiv:2604.15593v1 Announce Type: new Abstract: Large language models compress heterogeneous knowledge into a single parameter space, allowing facts from different domains to interfere during generati
- 6.3
UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs
arXiv:2604.15871v1 Announce Type: cross Abstract: The evaluation of visual editing models remains fragmented across methods and modalities. Existing benchmarks are often tailored to specific paradigms
- 6.3
The Price of Paranoia: Robust Risk-Sensitive Cooperation in Non-Stationary Multi-Agent Reinforcement Learning
arXiv:2604.15695v1 Announce Type: cross Abstract: Cooperative equilibria are fragile. When agents learn alongside each other rather than in a fixed environment, the process of learning destabilizes th
- 6.0
COMPASS: Benchmarking Constrained Optimization in LLM Agents
arXiv:2510.07043v2 Announce Type: replace Abstract: Human decision-making often involves constrained optimization. As LLM agents are deployed to assist with real-world tasks like travel planning, shop
- 6.0
The Metacognitive Monitoring Battery: A Cross-Domain Benchmark for LLM Self-Monitoring
arXiv:2604.15702v1 Announce Type: cross Abstract: We introduce a cross-domain behavioural assay of monitoring-control coupling in LLMs, grounded in the Nelson and Narens (1990) metacognitive framework
- 6.0
AdaVFM: Adaptive Vision Foundation Models for Edge Intelligence via LLM-Guided Execution
arXiv:2604.15622v1 Announce Type: cross Abstract: Language-aligned vision foundation models (VFMs) enable versatile visual understanding for always-on contextual AI, but their deployment on edge devic
- 6.0
When the Loop Closes: Architectural Limits of In-Context Isolation, Metacognitive Co-option, and the Two-Target Design Problem in Human-LLM Systems
arXiv:2604.15343v1 Announce Type: cross Abstract: We report a detailed autoethnographic case study of a single-subject who deliberately constructed and operated a multi-modal prompt-engineering system
- 6.0
Majority Voting for Code Generation
arXiv:2604.15618v1 Announce Type: new Abstract: We investigate Functional Majority Voting (FMV), a method based on functional consensus for code generation with Large Language Models, which identifies
- 6.0
vla-eval: A Unified Evaluation Harness for Vision-Language-Action Models
arXiv:2603.13966v2 Announce Type: replace Abstract: Vision-Language-Action (VLA) models are increasingly evaluated across multiple simulation benchmarks, yet adding each benchmark to an evaluation pip
- 6.0
Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints
arXiv:2507.16727v3 Announce Type: replace Abstract: Improving the reliability of large language models (LLMs) is critical for deploying them in real-world scenarios. In this paper, we propose \textbf{
- 6.0
AI Agents and Hard Choices
arXiv:2504.15304v2 Announce Type: replace Abstract: Can AI agents deal with hard choices -- cases where options are incommensurable because multiple objectives are pursued simultaneously? Adopting a t
- 6.0
PolicyBank: Evolving Policy Understanding for LLM Agents
arXiv:2604.15505v1 Announce Type: new Abstract: LLM agents operating under organizational policies must comply with authorization constraints typically specified in natural language. In practice, such