Posts

All the articles I've posted.

6.3
Learning Uncertainty from Sequential Internal Dispersion in Large Language Models
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.15741v1 Announce Type: new Abstract: Uncertainty estimation is a promising approach to detect hallucinations in large language models (LLMs). Recent approaches commonly depend on model inte
6.3
Preference Estimation via Opponent Modeling in Multi-Agent Negotiation
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.15687v1 Announce Type: new Abstract: Automated negotiation in complex, multi-party and multi-issue settings critically depends on accurate opponent modeling. However, conventional numerical
6.3
AIFIND: Artifact-Aware Interpreting Fine-Grained Alignment for Incremental Face Forgery Detection
2026年04月20日
· cs.AI updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.16207v1 Announce Type: cross Abstract: As forgery types continue to emerge consistently, Incremental Face Forgery Detection (IFFD) has become a crucial paradigm. However, existing methods t
6.3
ChemGraph-XANES: An Agentic Framework for XANES Simulation and Analysis
2026年04月20日
· cs.AI updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.16205v1 Announce Type: cross Abstract: Computational X-ray absorption near-edge structure (XANES) is widely used to probe local coordination environments, oxidation states, and electronic s
6.3
HyperGVL: Benchmarking and Improving Large Vision-Language Models in Hypergraph Understanding and Reasoning
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.15648v1 Announce Type: new Abstract: Large Vision-Language Models (LVLMs) consistently require new arenas to guide their expanding boundaries, yet their capabilities with hypergraphs remain
6.3
DALM: A Domain-Algebraic Language Model via Three-Phase Structured Generation
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.15593v1 Announce Type: new Abstract: Large language models compress heterogeneous knowledge into a single parameter space, allowing facts from different domains to interfere during generati
6.3
UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs
2026年04月20日
· cs.AI updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.15871v1 Announce Type: cross Abstract: The evaluation of visual editing models remains fragmented across methods and modalities. Existing benchmarks are often tailored to specific paradigms
6.3
The Price of Paranoia: Robust Risk-Sensitive Cooperation in Non-Stationary Multi-Agent Reinforcement Learning
2026年04月20日
· cs.AI updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.15695v1 Announce Type: cross Abstract: Cooperative equilibria are fragile. When agents learn alongside each other rather than in a fixed environment, the process of learning destabilizes th
6.0
COMPASS: Benchmarking Constrained Optimization in LLM Agents
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 14:45 采集
arXiv:2510.07043v2 Announce Type: replace Abstract: Human decision-making often involves constrained optimization. As LLM agents are deployed to assist with real-world tasks like travel planning, shop
6.0
The Metacognitive Monitoring Battery: A Cross-Domain Benchmark for LLM Self-Monitoring
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.15702v1 Announce Type: cross Abstract: We introduce a cross-domain behavioural assay of monitoring-control coupling in LLMs, grounded in the Nelson and Narens (1990) metacognitive framework
6.0
AdaVFM: Adaptive Vision Foundation Models for Edge Intelligence via LLM-Guided Execution
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.15622v1 Announce Type: cross Abstract: Language-aligned vision foundation models (VFMs) enable versatile visual understanding for always-on contextual AI, but their deployment on edge devic
6.0
When the Loop Closes: Architectural Limits of In-Context Isolation, Metacognitive Co-option, and the Two-Target Design Problem in Human-LLM Systems
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.15343v1 Announce Type: cross Abstract: We report a detailed autoethnographic case study of a single-subject who deliberately constructed and operated a multi-modal prompt-engineering system
6.0
Majority Voting for Code Generation
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.15618v1 Announce Type: new Abstract: We investigate Functional Majority Voting (FMV), a method based on functional consensus for code generation with Large Language Models, which identifies
6.0
vla-eval: A Unified Evaluation Harness for Vision-Language-Action Models
2026年04月20日
· cs.AI updates on arXiv.org· 04/20 14:45 采集
arXiv:2603.13966v2 Announce Type: replace Abstract: Vision-Language-Action (VLA) models are increasingly evaluated across multiple simulation benchmarks, yet adding each benchmark to an evaluation pip
6.0
Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints
2026年04月20日
· cs.AI updates on arXiv.org· 04/20 14:45 采集
arXiv:2507.16727v3 Announce Type: replace Abstract: Improving the reliability of large language models (LLMs) is critical for deploying them in real-world scenarios. In this paper, we propose \textbf{
6.0
AI Agents and Hard Choices
2026年04月20日
· cs.AI updates on arXiv.org· 04/20 14:45 采集
arXiv:2504.15304v2 Announce Type: replace Abstract: Can AI agents deal with hard choices -- cases where options are incommensurable because multiple objectives are pursued simultaneously? Adopting a t
6.0
PolicyBank: Evolving Policy Understanding for LLM Agents
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.15505v1 Announce Type: new Abstract: LLM agents operating under organizational policies must comply with authorization constraints typically specified in natural language. In practice, such
6.0
Towards Understanding, Analyzing, and Optimizing Agentic AI Execution: A CPU-Centric Perspective
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析
8.0
Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析
6.0
EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析

Posts

Learning Uncertainty from Sequential Internal Dispersion in Large Language Models

Preference Estimation via Opponent Modeling in Multi-Agent Negotiation

AIFIND: Artifact-Aware Interpreting Fine-Grained Alignment for Incremental Face Forgery Detection

ChemGraph-XANES: An Agentic Framework for XANES Simulation and Analysis

HyperGVL: Benchmarking and Improving Large Vision-Language Models in Hypergraph Understanding and Reasoning

DALM: A Domain-Algebraic Language Model via Three-Phase Structured Generation

UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs

The Price of Paranoia: Robust Risk-Sensitive Cooperation in Non-Stationary Multi-Agent Reinforcement Learning

COMPASS: Benchmarking Constrained Optimization in LLM Agents

The Metacognitive Monitoring Battery: A Cross-Domain Benchmark for LLM Self-Monitoring

AdaVFM: Adaptive Vision Foundation Models for Edge Intelligence via LLM-Guided Execution

When the Loop Closes: Architectural Limits of In-Context Isolation, Metacognitive Co-option, and the Two-Target Design Problem in Human-LLM Systems

Majority Voting for Code Generation

vla-eval: A Unified Evaluation Harness for Vision-Language-Action Models

Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints

AI Agents and Hard Choices

PolicyBank: Evolving Policy Understanding for LLM Agents

Towards Understanding, Analyzing, and Optimizing Agentic AI Execution: A CPU-Centric Perspective

Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover

EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems