Tag: ai-agent
All the articles with the tag "ai-agent".
- 7.5
Changes to GitHub Copilot Individual plans
GitHub Copilot因Agent工作流导致算力成本激增,宣布暂停个人版新注册并收紧使用限额,标志着AI Agent正在重塑SaaS产品的商业模式。
- 8.0
A Formal Security Framework for MCP-Based AI Agents: Threat Taxonomy, Verification Models, and Defense Mechanisms
arXiv:2604.05969v1 Announce Type: cross Abstract: The Model Context Protocol (MCP), introduced by Anthropic in November 2024 and now governed by th...
- 7.7
Measuring the Permission Gate: A Stress-Test Evaluation of Claude Code's Auto Mode
arXiv:2604.04978v1 Announce Type: cross Abstract: Claude Code's auto mode is the first deployed permission system for AI coding agents, using a two...
- 7.7
Your LLM Agent Can Leak Your Data: Data Exfiltration via Backdoored Tool Use
arXiv:2604.05432v1 Announce Type: cross Abstract: Tool-use large language model (LLM) agents are increasingly deployed to support sensitive workflo...
- 7.3
Architecture Without Architects: How AI Coding Agents Shape Software Architecture
arXiv:2604.04990v1 Announce Type: cross Abstract: AI coding agents select frameworks, scaffold infrastructure, and wire integrations, often in seco...
- 7.3
Foundations for Agentic AI Investigations from the Forensic Analysis of OpenClaw
arXiv:2604.05589v1 Announce Type: cross Abstract: Agentic Al systems are increasingly deployed as personal assistants and are likely to become a co...
- 7.3
Scaling Coding Agents via Atomic Skills
arXiv:2604.05013v1 Announce Type: cross Abstract: Current LLM coding agents are predominantly trained on composite benchmarks (e.g., bug fixing), w...
- 6.7
URSA: The Universal Research and Scientific Agent
arXiv:2506.22653v2 Announce Type: replace Abstract: Large language models (LLMs) have moved far beyond their initial form as simple chatbots, now c...
- 6.7
Stop Fixating on Prompts: Reasoning Hijacking and Constraint Tightening for Red-Teaming LLM Agents
arXiv:2604.05549v1 Announce Type: new Abstract: With the widespread application of LLM-based agents across various domains, their complexity has in...
- 6.7
Gym-Anything: Turn any Software into an Agent Environment
arXiv:2604.06126v1 Announce Type: cross Abstract: Computer-use agents hold the promise of assisting in a wide range of digital economic activities....
- 6.7
Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents
arXiv:2604.04979v1 Announce Type: cross Abstract: Coding agents repeatedly consume long tool observations even though only a small fraction of each...
- 6.4
Social Dynamics as Critical Vulnerabilities that Undermine Objective Decision-Making in LLM Collectives
arXiv:2604.06091v1 Announce Type: cross Abstract: Large language model (LLM) agents are increasingly acting as human delegates in multi-agent envir...
- 6.4
TS-Agent: Understanding and Reasoning Over Raw Time Series via Iterative Insight Gathering
arXiv:2510.07432v2 Announce Type: replace Abstract: Large language models (LLMs) exhibit strong symbolic and compositional reasoning, yet they stru...
- 6.4
Beyond Syntax: Action Semantics Learning for App Agents
arXiv:2506.17697v3 Announce Type: replace Abstract: The recent development of Large Language Models (LLMs) enables the rise of App agents that inte...
- 6.4
UserCentrix: An Agentic Memory-augmented AI Framework for Smart Spaces
arXiv:2505.00472v2 Announce Type: replace Abstract: Agentic Artificial Intelligence (AI) constitutes a transformative paradigm in the evolution of ...
- 6.4
Who Governs the Machine? A Machine Identity Governance Taxonomy (MIGT) for AI Systems Operating Across Enterprise and Geopolitical Boundaries
arXiv:2604.06148v1 Announce Type: cross Abstract: The governance of artificial intelligence has a blind spot: the machine identities that AI system...
- 6.4
Human Values Matter: Investigating How Misalignment Shapes Collective Behaviors in LLM Agent Communities
arXiv:2604.05339v1 Announce Type: new Abstract: As LLMs become increasingly integrated into human society, evaluating their orientations on human v...
- 6.4
OutSafe-Bench: A Benchmark for Multimodal Offensive Content Detection in Large Language Models
arXiv:2511.10287v4 Announce Type: replace-cross Abstract: Since Multimodal Large Language Models (MLLMs) are increasingly being integrated into eve...
- 6.4
EpiBench: Benchmarking Multi-turn Research Workflows for Multimodal Agents
arXiv:2604.05557v1 Announce Type: new Abstract: Scientific research follows multi-turn, multi-step workflows that require proactively searching the...
- 6.4
Context-Agent: Dynamic Discourse Trees for Non-Linear Dialogue
arXiv:2604.05552v1 Announce Type: new Abstract: Large Language Models demonstrate outstanding performance in many language tasks but still face fun...