Tag: ai-agent
All the articles with the tag "ai-agent".
- 6.4
Framing Effects in Independent-Agent Large Language Models: A Cross-Family Behavioral Analysis
arXiv:2603.19282v2 Announce Type: replace Abstract: In many real-world applications, large language models (LLMs) operate as independent agents wit...
- 6.4
Can We Predict Before Executing Machine Learning Agents?
arXiv:2601.05930v2 Announce Type: replace Abstract: Autonomous machine learning agents have revolutionized scientific discovery, yet they remain co...
- 6.4
Towards Trustworthy Report Generation: A Deep Research Agent with Progressive Confidence Estimation and Calibration
arXiv:2604.05952v1 Announce Type: cross Abstract: As agent-based systems continue to evolve, deep research agents are capable of automatically gene...
- 6.4
AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling
arXiv:2603.21357v2 Announce Type: replace-cross Abstract: LLM agents fail on the majority of real-world tasks -- GPT-4o succeeds on fewer than 15% ...
- 6.4
LUDOBENCH: Evaluating LLM Behavioural Decision-Making Through Spot-Based Board Game Scenarios in Ludo
arXiv:2604.05681v1 Announce Type: cross Abstract: We introduce LudoBench, a benchmark for evaluating LLM strategic reasoning in Ludo, a stochastic ...
- 6.4
IBISAgent: Reinforcing Pixel-Level Visual Reasoning in MLLMs for Universal Biomedical Object Referring and Segmentation
arXiv:2601.03054v4 Announce Type: replace-cross Abstract: Recent research on medical MLLMs has gradually shifted its focus from image-level underst...
- 6.4
Can We Trust a Black-box LLM? LLM Untrustworthy Boundary Detection via Bias-Diffusion and Multi-Agent Reinforcement Learning
arXiv:2604.05483v1 Announce Type: cross Abstract: Large Language Models (LLMs) have shown a high capability in answering questions on a diverse ran...
- 6.4
MMORF: A Multi-agent Framework for Designing Multi-objective Retrosynthesis Planning Systems
arXiv:2604.05075v1 Announce Type: cross Abstract: Multi-objective retrosynthesis planning is a critical chemistry task requiring dynamic balancing ...
- 6.4
Learning to Retrieve from Agent Trajectories
arXiv:2604.04949v1 Announce Type: cross Abstract: Information retrieval (IR) systems have traditionally been designed and trained for human users, ...
- 6.4
MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings
arXiv:2603.09643v4 Announce Type: replace-cross Abstract: Current evaluation frameworks and benchmarks for LLM powered agents focus on text chat dr...
- 6.4
ACE-Bench: Agent Configurable Evaluation with Scalable Horizons and Controllable Difficulty under Lightweight Environments
arXiv:2604.06111v1 Announce Type: cross Abstract: Existing Agent benchmarks suffer from two critical limitations: high environment interaction over...
- 6.4
Epistemic Blinding: An Inference-Time Protocol for Auditing Prior Contamination in LLM-Assisted Analysis
arXiv:2604.06013v1 Announce Type: cross Abstract: This paper presents epistemic blinding in the context of an agentic system that uses large langua...
- 6.4
FinReporting: An Agentic Workflow for Localized Reporting of Cross-Jurisdiction Financial Disclosures
arXiv:2604.05966v1 Announce Type: new Abstract: Financial reporting systems increasingly use large language models (LLMs) to extract and summarize ...
- 6.4
TransAgent: Enhancing LLM-Based Code Translation via Fine-Grained Execution Alignment
arXiv:2409.19894v5 Announce Type: replace-cross Abstract: Code translation transforms code between programming languages while preserving functiona...
- 6.4
Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents
arXiv:2604.05808v1 Announce Type: new Abstract: Large language model (LLM) agents have demonstrated strong capabilities in complex interactive deci...
- 6.4
MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning
arXiv:2604.05943v1 Announce Type: new Abstract: Recent advances in multi-agent reinforcement learning (MARL) have demonstrated success in numerous ...
- 6.4
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents
arXiv:2604.06132v1 Announce Type: new Abstract: Large language models are increasingly deployed as autonomous agents executing multi-step workflows...
- 6.4
Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems
arXiv:2604.04936v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) systems critically depend on effective document chunking str...
- 6.4
MA-IDS: Multi-Agent RAG Framework for IoT Network Intrusion Detection with an Experience Library
arXiv:2604.05458v1 Announce Type: cross Abstract: Network Intrusion Detection Systems (NIDS) face important limitations. Signature-based methods ar...
- 6.4
LanG -- A Governance-Aware Agentic AI Platform for Unified Security Operations
arXiv:2604.05440v1 Announce Type: cross Abstract: Modern Security Operations Centers struggle with alert fatigue, fragmented tooling, and limited c...