Tag: ai-agent

All the articles with the tag "ai-agent".

6.4
Framing Effects in Independent-Agent Large Language Models: A Cross-Family Behavioral Analysis
2026年04月08日
· cs.CL updates on arXiv.org· 04/08 12:31 采集
arXiv:2603.19282v2 Announce Type: replace Abstract: In many real-world applications, large language models (LLMs) operate as independent agents wit...
6.4
Can We Predict Before Executing Machine Learning Agents?
2026年04月08日
· cs.CL updates on arXiv.org· 04/08 12:31 采集
arXiv:2601.05930v2 Announce Type: replace Abstract: Autonomous machine learning agents have revolutionized scientific discovery, yet they remain co...
6.4
Towards Trustworthy Report Generation: A Deep Research Agent with Progressive Confidence Estimation and Calibration
2026年04月08日
· cs.CL updates on arXiv.org· 04/08 12:31 采集
arXiv:2604.05952v1 Announce Type: cross Abstract: As agent-based systems continue to evolve, deep research agents are capable of automatically gene...
6.4
AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling
2026年04月08日
· cs.CL updates on arXiv.org· 04/08 12:31 采集
arXiv:2603.21357v2 Announce Type: replace-cross Abstract: LLM agents fail on the majority of real-world tasks -- GPT-4o succeeds on fewer than 15% ...
6.4
LUDOBENCH: Evaluating LLM Behavioural Decision-Making Through Spot-Based Board Game Scenarios in Ludo
2026年04月08日
· cs.CL updates on arXiv.org· 04/08 12:31 采集
arXiv:2604.05681v1 Announce Type: cross Abstract: We introduce LudoBench, a benchmark for evaluating LLM strategic reasoning in Ludo, a stochastic ...
6.4
IBISAgent: Reinforcing Pixel-Level Visual Reasoning in MLLMs for Universal Biomedical Object Referring and Segmentation
2026年04月08日
· cs.AI updates on arXiv.org· 04/08 12:31 采集
arXiv:2601.03054v4 Announce Type: replace-cross Abstract: Recent research on medical MLLMs has gradually shifted its focus from image-level underst...
6.4
Can We Trust a Black-box LLM? LLM Untrustworthy Boundary Detection via Bias-Diffusion and Multi-Agent Reinforcement Learning
2026年04月08日
· cs.CL updates on arXiv.org· 04/08 12:31 采集
arXiv:2604.05483v1 Announce Type: cross Abstract: Large Language Models (LLMs) have shown a high capability in answering questions on a diverse ran...
6.4
MMORF: A Multi-agent Framework for Designing Multi-objective Retrosynthesis Planning Systems
2026年04月08日
· cs.CL updates on arXiv.org· 04/08 12:31 采集
arXiv:2604.05075v1 Announce Type: cross Abstract: Multi-objective retrosynthesis planning is a critical chemistry task requiring dynamic balancing ...
6.4
Learning to Retrieve from Agent Trajectories
2026年04月08日
· cs.CL updates on arXiv.org· 04/08 12:31 采集
arXiv:2604.04949v1 Announce Type: cross Abstract: Information retrieval (IR) systems have traditionally been designed and trained for human users, ...
6.4
MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings
2026年04月08日
· cs.AI updates on arXiv.org· 04/08 12:31 采集
arXiv:2603.09643v4 Announce Type: replace-cross Abstract: Current evaluation frameworks and benchmarks for LLM powered agents focus on text chat dr...
6.4
ACE-Bench: Agent Configurable Evaluation with Scalable Horizons and Controllable Difficulty under Lightweight Environments
2026年04月08日
· cs.CL updates on arXiv.org· 04/08 12:31 采集
arXiv:2604.06111v1 Announce Type: cross Abstract: Existing Agent benchmarks suffer from two critical limitations: high environment interaction over...
6.4
Epistemic Blinding: An Inference-Time Protocol for Auditing Prior Contamination in LLM-Assisted Analysis
2026年04月08日
· cs.CL updates on arXiv.org· 04/08 12:31 采集
arXiv:2604.06013v1 Announce Type: cross Abstract: This paper presents epistemic blinding in the context of an agentic system that uses large langua...
6.4
FinReporting: An Agentic Workflow for Localized Reporting of Cross-Jurisdiction Financial Disclosures
2026年04月08日
· cs.CL updates on arXiv.org· 04/08 12:31 采集
arXiv:2604.05966v1 Announce Type: new Abstract: Financial reporting systems increasingly use large language models (LLMs) to extract and summarize ...
6.4
TransAgent: Enhancing LLM-Based Code Translation via Fine-Grained Execution Alignment
2026年04月08日
· cs.AI updates on arXiv.org· 04/08 12:31 采集
arXiv:2409.19894v5 Announce Type: replace-cross Abstract: Code translation transforms code between programming languages while preserving functiona...
6.4
Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents
2026年04月08日
· cs.AI updates on arXiv.org· 04/08 12:31 采集
arXiv:2604.05808v1 Announce Type: new Abstract: Large language model (LLM) agents have demonstrated strong capabilities in complex interactive deci...
6.4
MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning
2026年04月08日
· cs.AI updates on arXiv.org· 04/08 12:31 采集
arXiv:2604.05943v1 Announce Type: new Abstract: Recent advances in multi-agent reinforcement learning (MARL) have demonstrated success in numerous ...
6.4
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents
2026年04月08日
· cs.AI updates on arXiv.org· 04/08 12:31 采集
arXiv:2604.06132v1 Announce Type: new Abstract: Large language models are increasingly deployed as autonomous agents executing multi-step workflows...
6.4
Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems
2026年04月08日
· cs.AI updates on arXiv.org· 04/08 12:31 采集
arXiv:2604.04936v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) systems critically depend on effective document chunking str...
6.4
MA-IDS: Multi-Agent RAG Framework for IoT Network Intrusion Detection with an Experience Library
2026年04月08日
· cs.AI updates on arXiv.org· 04/08 12:31 采集
arXiv:2604.05458v1 Announce Type: cross Abstract: Network Intrusion Detection Systems (NIDS) face important limitations. Signature-based methods ar...
6.4
LanG -- A Governance-Aware Agentic AI Platform for Unified Security Operations
2026年04月08日
· cs.AI updates on arXiv.org· 04/08 12:31 采集
arXiv:2604.05440v1 Announce Type: cross Abstract: Modern Security Operations Centers struggle with alert fatigue, fragmented tooling, and limited c...

Tag: ai-agent

Framing Effects in Independent-Agent Large Language Models: A Cross-Family Behavioral Analysis

Can We Predict Before Executing Machine Learning Agents?

Towards Trustworthy Report Generation: A Deep Research Agent with Progressive Confidence Estimation and Calibration

AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling

LUDOBENCH: Evaluating LLM Behavioural Decision-Making Through Spot-Based Board Game Scenarios in Ludo

IBISAgent: Reinforcing Pixel-Level Visual Reasoning in MLLMs for Universal Biomedical Object Referring and Segmentation

Can We Trust a Black-box LLM? LLM Untrustworthy Boundary Detection via Bias-Diffusion and Multi-Agent Reinforcement Learning

MMORF: A Multi-agent Framework for Designing Multi-objective Retrosynthesis Planning Systems

Learning to Retrieve from Agent Trajectories

MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings

ACE-Bench: Agent Configurable Evaluation with Scalable Horizons and Controllable Difficulty under Lightweight Environments

Epistemic Blinding: An Inference-Time Protocol for Auditing Prior Contamination in LLM-Assisted Analysis

FinReporting: An Agentic Workflow for Localized Reporting of Cross-Jurisdiction Financial Disclosures

TransAgent: Enhancing LLM-Based Code Translation via Fine-Grained Execution Alignment

Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents

MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning

Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents

Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems

MA-IDS: Multi-Agent RAG Framework for IoT Network Intrusion Detection with an Experience Library

LanG -- A Governance-Aware Agentic AI Platform for Unified Security Operations