Posts

All the articles I've posted.

5.5
AMA: Adaptive Memory via Multi-Agent Collaboration
2026年04月17日
· cs.AI updates on arXiv.org· 04/17 12:31 采集
arXiv:2601.20352v3 Announce Type: replace Abstract: The rapid evolution of Large Language Model (LLM) agents has necessitated robust memory systems to support cohesive long-term interaction and comple
5.5
QuantCode-Bench: A Benchmark for Evaluating the Ability of Large Language Models to Generate Verifiable Code
2026年04月17日
· cs.CL updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.15151v1 Announce Type: new Abstract: Large language models have demonstrated strong performance on general-purpose programming tasks, yet their ability to generate executable algorithmic tr
6.5
Formalizing the Safety, Security, and Functional Properties of Agentic AI Systems
2026年04月17日
· cs.AI updates on arXiv.org· 04/17 12:31 采集
arXiv:2510.14133v2 Announce Type: replace Abstract: Agentic AI systems, which leverage multiple autonomous agents and large language models (LLMs), are increasingly used to address complex, multi-step
5.5
IE as Cache: Information Extraction Enhanced Agentic Reasoning
2026年04月17日
· cs.CL updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.14930v1 Announce Type: new Abstract: Information Extraction aims to distill structured, decision-relevant information from unstructured text, serving as a foundation for downstream understa
6.0
Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization
2026年04月17日
· cs.AI updates on arXiv.org· 04/17 12:31 采集
arXiv:2508.10164v2 Announce Type: replace Abstract: Recent advances in Large Reasoning Models (LRMs) have demonstrated strong performance on complex tasks through long Chain-of-Thought (CoT) reasoning
5.5
FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks
2026年04月17日
· cs.AI updates on arXiv.org· 04/17 12:31 采集
arXiv:2505.19662v3 Announce Type: replace Abstract: This paper introduces FieldWorkArena, a benchmark for agentic AI targeting real-world field work. With the recent increase in demand for agentic AI,
6.0
Modeling LLM Unlearning as an Asymmetric Two-Task Learning Problem
2026年04月17日
· cs.CL updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.14808v1 Announce Type: new Abstract: Machine unlearning for large language models (LLMs) aims to remove targeted knowledge while preserving general capability. In this paper, we recast LLM
5.5
Agentic AI Optimisation (AAIO): what it is, how it works, why it matters, and how to do it
2026年04月17日
· cs.AI updates on arXiv.org· 04/17 12:31 采集
arXiv:2504.12482v2 Announce Type: replace Abstract: The emergence of Agentic Artificial Intelligence (AAI) systems capable of independently initiating digital interactions necessitates a new optimisat
5.5
StoryCoder: Narrative Reformulation for Structured Reasoning in LLM Code Generation
2026年04月17日
· cs.CL updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.14631v1 Announce Type: new Abstract: Effective code generation requires both model capability and a problem representation that carefully structures how models reason and plan. Existing app
5.5
Mechanistic Decoding of Cognitive Constructs in LLMs
2026年04月17日
· cs.CL updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.14593v1 Announce Type: new Abstract: While Large Language Models (LLMs) demonstrate increasingly sophisticated affective capabilities, the internal mechanisms by which they process complex
5.5
Psychological Steering of Large Language Models
2026年04月17日
· cs.CL updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.14463v1 Announce Type: new Abstract: Large language models (LLMs) emulate a consistent human-like behavior that can be shaped through activation-level interventions. This paradigm is conver
6.5
The Autocorrelation Blind Spot: Why 42% of Turn-Level Findings in LLM Conversational Analysis May Be Illusory
2026年04月17日
· cs.CL updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.14414v1 Announce Type: new Abstract: Turn-level metrics are widely used to evaluate properties of multi-turn human-LLM conversations, from safety and sycophancy to dialogue quality. However
6.0
The Cost of Language: Centroid Erasure Exposes and Exploits Modal Competition in Multimodal LLMs
2026年04月17日
· cs.CL updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.14363v1 Announce Type: new Abstract: Multimodal language models systematically underperform on visual perception tasks, yet the structure underlying this failure remains poorly understood.
5.5
APEX-MEM: Agentic Semi-Structured Memory with Temporal Reasoning for Long-Term Context
2026年04月17日
· cs.CL updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.14362v1 Announce Type: new Abstract: Large language models still struggle with reliable long-term conversational memory: simply enlarging context windows or applying naive retrieval often i
5.5
Faithfulness Serum: Mitigating the Faithfulness Gap in Textual Explanations of LLMs
2026年04月17日
· cs.CL updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.14325v1 Announce Type: new Abstract: Large language models (LLMs) achieve strong performance and have revolutionized NLP, but their lack of explainability keeps them treated as black boxes,
5.5
Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Malicious Visual Attacks
2026年04月17日
· cs.AI updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.13803v1 Announce Type: cross Abstract: Vision-language models are increasingly deployed in high-stakes settings, yet their susceptibility to sycophantic manipulation remains poorly understo
5.5
CROP: Token-Efficient Reasoning in Large Language Models via Regularized Prompt Optimization
2026年04月17日
· cs.CL updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.14214v1 Announce Type: new Abstract: Large Language Models utilizing reasoning techniques improve task performance but incur significant latency and token costs due to verbose generation. E
6.0
SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment
2026年04月17日
· cs.AI updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.13630v1 Announce Type: cross Abstract: The performance of large language model (LLM) agents depends critically on the execution harness, the system layer that orchestrates tool use, context
6.0
Internal Knowledge Without External Expression: Probing the Generalization Boundaries of Factual Knowledge in LLMs
2026年04月17日
· cs.CL updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.14180v1 Announce Type: new Abstract: We train a 318M-parameter Transformer language model from scratch on a curated corpus of 1.56 billion tokens of pure Classical Chinese, with zero Englis
5.5
Stateful Evidence-Driven Retrieval-Augmented Generation with Iterative Reasoning
2026年04月17日
· 04/17 12:31 采集

Posts

AMA: Adaptive Memory via Multi-Agent Collaboration

QuantCode-Bench: A Benchmark for Evaluating the Ability of Large Language Models to Generate Verifiable Code

Formalizing the Safety, Security, and Functional Properties of Agentic AI Systems

IE as Cache: Information Extraction Enhanced Agentic Reasoning

Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization

FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks

Modeling LLM Unlearning as an Asymmetric Two-Task Learning Problem

Agentic AI Optimisation (AAIO): what it is, how it works, why it matters, and how to do it

StoryCoder: Narrative Reformulation for Structured Reasoning in LLM Code Generation

Mechanistic Decoding of Cognitive Constructs in LLMs

Psychological Steering of Large Language Models

The Autocorrelation Blind Spot: Why 42% of Turn-Level Findings in LLM Conversational Analysis May Be Illusory

The Cost of Language: Centroid Erasure Exposes and Exploits Modal Competition in Multimodal LLMs

APEX-MEM: Agentic Semi-Structured Memory with Temporal Reasoning for Long-Term Context

Faithfulness Serum: Mitigating the Faithfulness Gap in Textual Explanations of LLMs

Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Malicious Visual Attacks

CROP: Token-Efficient Reasoning in Large Language Models via Regularized Prompt Optimization

SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment

Internal Knowledge Without External Expression: Probing the Generalization Boundaries of Factual Knowledge in LLMs

Stateful Evidence-Driven Retrieval-Augmented Generation with Iterative Reasoning