Posts

All the articles I've posted.

6.0
SolidCoder: Bridging the Mental-Reality Gap in LLM Code Generation through Concrete Execution
2026年04月23日
· arXiv· 04/23 08:00 采集
6.0
If you're waiting for a sign... that might not be it! Mitigating Trust Boundary Crossings in Multi-Agent LLM Workflows
2026年04月23日
· arXiv· 04/23 08:00 采集
6.5
Coding with Eyes: Visual Feedback Unlocks Reliable GUI Code Generating and Debugging
2026年04月23日
· arXiv· 04/23 08:00 采集
6.0
Image Generators are Generalist Vision Learners
2026年04月23日
· arXiv· 04/23 08:00 采集
6.0
Shift-Up: A Framework for Software Engineering Guardrails in AI-native Software Development
2026年04月23日
· arXiv· 04/23 08:00 采集
6.5
A Survey of Scaling in Large Language Model Reasoning
2026年04月23日
· arXiv· 04/23 08:00 采集
6.0
Lightweight LLM Agent Memory with Small Language Models
2026年04月23日
· arXiv· 04/23 08:00 采集
6.5
LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agents
2026年04月23日
· arXiv· 04/23 08:00 采集
6.0
The OpenHands Software Agent SDK: A Composable and Extensible Foundation for Programming Agents
2026年04月23日
· arXiv· 04/23 08:00 采集
6.0
Mitigating Prompt-Induced Cognitive Biases in General-Purpose AI for Software Engineering
2026年04月23日
· arXiv· 04/23 08:00 采集
7.0
JTPRO: A Joint Tool-Prompt Reflective Optimization Framework for Language Agents
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
JTPRO框架解决LLM Agent在工具数量庞大时的工具误选和槽位实例化错误问题。
6.5
Forage V2: Knowledge Evolution and Transfer in Autonomous Agent Organizations
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
Forage V2通过共进化和方法隔离实现自主Agent组织的知识演化和迁移。
6.5
EvoAgent: An Evolvable Agent Framework with Skill Learning and Multi-Agent Delegation
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
EvoAgent将技能建模为多文件结构化能力单元，支持触发机制和进化元数据。
7.5
Stateless Decision Memory for Enterprise AI Agents
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
揭示企业级Agent部署中RAG主导地位背后的四个隐藏系统工程约束。
6.5
FSFM: A Biologically-Inspired Framework for Selective Forgetting of Agent Memory
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
受海马体索引巩固理论和艾宾浩斯遗忘曲线启发的Agent选择性遗忘框架。
6.3
Measuring the Machine: Evaluating Generative AI as Pluralist Sociotechical Systems
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
从测量理论视角重新审视AI基准测试的本体论意义：基准不仅测量，更塑造了模型的显现形态。
6.5
Large Language Models Outperform Humans in Fraud Detection and Resistance to Motivated Investor Pressure
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
预注册实验：7个主流LLM在欺诈检测上超越人类基准，但在已说服的投资者压力下会抑制警告。
6.3
Learning to Evolve: A Self-Improving Framework for Multi-Agent Systems via Textual Parameter Graph Optimization
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
通过文本参数图(TPG)优化实现多Agent系统的自改进，赋予优化器从经验中学习的能力。
7.7
SWE-chat: Coding Agent Interactions From Real Users in the Wild
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
SWE-chat：首个来自开源开发者的真实编码Agent使用大数据集，6000会话/355K工具调用。
6.5
Diagnosing CFG Interpretation in LLMs
2026年04月23日
· arXiv cs.AI· 04/23 14:32 采集
RoboGrid框架：系统化解构LLM作为上下文文法解释器的语法/行为/语义三个维度能力。

Posts

SolidCoder: Bridging the Mental-Reality Gap in LLM Code Generation through Concrete Execution

If you're waiting for a sign... that might not be it! Mitigating Trust Boundary Crossings in Multi-Agent LLM Workflows

Coding with Eyes: Visual Feedback Unlocks Reliable GUI Code Generating and Debugging

Image Generators are Generalist Vision Learners

Shift-Up: A Framework for Software Engineering Guardrails in AI-native Software Development

A Survey of Scaling in Large Language Model Reasoning

Lightweight LLM Agent Memory with Small Language Models

LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agents

The OpenHands Software Agent SDK: A Composable and Extensible Foundation for Programming Agents

Mitigating Prompt-Induced Cognitive Biases in General-Purpose AI for Software Engineering

JTPRO: A Joint Tool-Prompt Reflective Optimization Framework for Language Agents

Forage V2: Knowledge Evolution and Transfer in Autonomous Agent Organizations

EvoAgent: An Evolvable Agent Framework with Skill Learning and Multi-Agent Delegation

Stateless Decision Memory for Enterprise AI Agents

FSFM: A Biologically-Inspired Framework for Selective Forgetting of Agent Memory

Measuring the Machine: Evaluating Generative AI as Pluralist Sociotechical Systems

Large Language Models Outperform Humans in Fraud Detection and Resistance to Motivated Investor Pressure

Learning to Evolve: A Self-Improving Framework for Multi-Agent Systems via Textual Parameter Graph Optimization

SWE-chat: Coding Agent Interactions From Real Users in the Wild

Diagnosing CFG Interpretation in LLMs