Posts

All the articles I've posted.

7.0
Coding with Eyes: Visual Feedback Unlocks Reliable GUI Code Generating and Debugging
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
视觉反馈解锁可靠的GUI代码生成与调试：解决编码Agent在图形界面场景下的核心瓶颈。
6.5
Soft-Label Governance for Distributional Safety in Multi-Agent Systems
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
SWARM框架：用软概率标签替代二元好/坏分类来评估多Agent系统的分布式安全性。
7.7
SolidCoder: Bridging the Mental-Reality Gap in LLM Code Generation through Concrete Execution
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
SolidCoder揭示代码生成的根本缺陷——'心理-现实差距'：LLM幻觉执行轨迹并自信验证buggy代码，通过具体执行弥合。
7.3
More Is Different: Toward a Theory of Emergence in AI-Native Software Ecosystems
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
致敬Anderson的经典论文，探讨AI原生软件生态系统中的涌现理论——个体正确但整体失效的根本挑战。
6.5
Mitigating Trust Boundary Confusion from Visual Injections on Vision-Language Agentic Systems
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
视觉语言Agent系统中的信任边界混淆：环境信号（如交通灯）被视觉注入攻击伪造时Agent如何区分真伪。
6.5
Behavioral Transfer in AI Agents: Evidence and Privacy Implications
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
AI Agent的行为会向其部署者的隐私特征迁移——这一发现对Agent隐私保护有深远含义。
6.5
Information Aggregation with AI Agents
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
AI Agent能否通过交易聚合分散的私有信息？通过预测市场实验测量Agent的信息聚合能力。
7.0
Auditing and Controlling AI Agent Actions in Spreadsheets
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
电子表格中AI Agent操作的审计与控制：解决Agent自主执行过程中间过程不透明的核心用户体验问题。
6.0
Taint-Style Vulnerability Detection for Node.js Packages Using LLM Agent Reasoning
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
利用LLM Agent推理能力进行Node.js包的污点风格漏洞检测与确认，应对动态JS特性和海量依赖的挑战。
6.5
Cortex 2.0: Grounding World Models in Real-World Industrial Deployment
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
Cortex 2.0：将世界模型锚定到真实工业部署，解决VLA模型在长时程机器人任务中的累积失败模式。
7.0
AgentLens: Adaptive Visual Modalities for Human-Agent Interaction in Mobile GUI Agents
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
AgentLens：移动GUI Agent的自适应视觉模态——解决前台执行（透明但无法多任务）与后台执行（可多任务但无视觉感知）的两难困境。
7.3
Image Generators are Generalist Vision Learners
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
图像生成模型是通用的视觉学习者——提供生成式预训练涌现视觉理解能力的系统性证据，挑战生成/理解二分法。
7.0
Shift-Up: A Framework for Software Engineering Guardrails in AI-Native Software Development
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
Shift-Up框架：为AI原生软件开发（vibe coding）提供工程护栏，直面架构漂移、可追溯性和可维护性三大痛点。
6.3
Relative Principals, Pluralistic Alignment, and the Structural Value Alignment Problem
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
将价值对齐问题从技术/规范挑战重构为治理结构问题：不是抽象地对齐与否，而是'对谁足够对齐、代价为何'。
7.0
A Survey of Scaling in Large Language Model Reasoning
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
LLM推理Scaling Laws综合综述：推理能力的scaling比数据和模型scaling更复杂，甚至可能产生负面影响。
6.5
Same Content, Different Answers: Cross-Modal Inconsistency in MLLMs
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
REST/REST+ Benchmark：系统评估多模态大语言的跨模态不一致性——同一内容在不同模态下给出矛盾回答。
7.0
Lightweight LLM Agent Memory with Small Language Models
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
使用小语言模型(SLM)实现轻量级LLM Agent记忆系统，解决检索式外部记忆精度不稳定的核心问题。
6.5
From Admission to Invariants: Measuring Deviation in Delegated Agent Systems
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
Agent Control Protocol揭示委托Agent系统的结构极限：正确运行的执行引擎进入行为漂移不可见的体制——因为执行信号在偏差可测层的下方运作。
7.3
LiteResearcher: Scalable Agentic RL Training Framework for Deep Research Agent
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
LiteResearcher：可扩展的Agentic RL训练框架，解决深度研究Agent的合成数据贫乏和真实搜索依赖两大耦合瓶颈。
8.0
The OpenHands Software Agent SDK: A Composable and Extensible Foundation for Production Agents
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
OpenHands Software Agent SDK：生产级软件Agent的可组合可扩展基础架构，覆盖灵活性、安全执行和用户交互三大核心需求。

Posts

Coding with Eyes: Visual Feedback Unlocks Reliable GUI Code Generating and Debugging

Soft-Label Governance for Distributional Safety in Multi-Agent Systems

SolidCoder: Bridging the Mental-Reality Gap in LLM Code Generation through Concrete Execution

More Is Different: Toward a Theory of Emergence in AI-Native Software Ecosystems

Mitigating Trust Boundary Confusion from Visual Injections on Vision-Language Agentic Systems

Behavioral Transfer in AI Agents: Evidence and Privacy Implications

Information Aggregation with AI Agents

Auditing and Controlling AI Agent Actions in Spreadsheets

Taint-Style Vulnerability Detection for Node.js Packages Using LLM Agent Reasoning

Cortex 2.0: Grounding World Models in Real-World Industrial Deployment

AgentLens: Adaptive Visual Modalities for Human-Agent Interaction in Mobile GUI Agents

Image Generators are Generalist Vision Learners

Shift-Up: A Framework for Software Engineering Guardrails in AI-Native Software Development

Relative Principals, Pluralistic Alignment, and the Structural Value Alignment Problem

A Survey of Scaling in Large Language Model Reasoning

Same Content, Different Answers: Cross-Modal Inconsistency in MLLMs

Lightweight LLM Agent Memory with Small Language Models

From Admission to Invariants: Measuring Deviation in Delegated Agent Systems

LiteResearcher: Scalable Agentic RL Training Framework for Deep Research Agent

The OpenHands Software Agent SDK: A Composable and Extensible Foundation for Production Agents