Posts
All the articles I've posted.
- 5.5
AMA: Adaptive Memory via Multi-Agent Collaboration
arXiv:2601.20352v3 Announce Type: replace Abstract: The rapid evolution of Large Language Model (LLM) agents has necessitated robust memory systems to support cohesive long-term interaction and comple
- 5.5
QuantCode-Bench: A Benchmark for Evaluating the Ability of Large Language Models to Generate Verifiable Code
arXiv:2604.15151v1 Announce Type: new Abstract: Large language models have demonstrated strong performance on general-purpose programming tasks, yet their ability to generate executable algorithmic tr
- 6.5
Formalizing the Safety, Security, and Functional Properties of Agentic AI Systems
arXiv:2510.14133v2 Announce Type: replace Abstract: Agentic AI systems, which leverage multiple autonomous agents and large language models (LLMs), are increasingly used to address complex, multi-step
- 5.5
IE as Cache: Information Extraction Enhanced Agentic Reasoning
arXiv:2604.14930v1 Announce Type: new Abstract: Information Extraction aims to distill structured, decision-relevant information from unstructured text, serving as a foundation for downstream understa
- 6.0
Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization
arXiv:2508.10164v2 Announce Type: replace Abstract: Recent advances in Large Reasoning Models (LRMs) have demonstrated strong performance on complex tasks through long Chain-of-Thought (CoT) reasoning
- 5.5
FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks
arXiv:2505.19662v3 Announce Type: replace Abstract: This paper introduces FieldWorkArena, a benchmark for agentic AI targeting real-world field work. With the recent increase in demand for agentic AI,
- 6.0
Modeling LLM Unlearning as an Asymmetric Two-Task Learning Problem
arXiv:2604.14808v1 Announce Type: new Abstract: Machine unlearning for large language models (LLMs) aims to remove targeted knowledge while preserving general capability. In this paper, we recast LLM
- 5.5
Agentic AI Optimisation (AAIO): what it is, how it works, why it matters, and how to do it
arXiv:2504.12482v2 Announce Type: replace Abstract: The emergence of Agentic Artificial Intelligence (AAI) systems capable of independently initiating digital interactions necessitates a new optimisat
- 5.5
StoryCoder: Narrative Reformulation for Structured Reasoning in LLM Code Generation
arXiv:2604.14631v1 Announce Type: new Abstract: Effective code generation requires both model capability and a problem representation that carefully structures how models reason and plan. Existing app
- 5.5
Mechanistic Decoding of Cognitive Constructs in LLMs
arXiv:2604.14593v1 Announce Type: new Abstract: While Large Language Models (LLMs) demonstrate increasingly sophisticated affective capabilities, the internal mechanisms by which they process complex
- 5.5
Psychological Steering of Large Language Models
arXiv:2604.14463v1 Announce Type: new Abstract: Large language models (LLMs) emulate a consistent human-like behavior that can be shaped through activation-level interventions. This paradigm is conver
- 6.5
The Autocorrelation Blind Spot: Why 42% of Turn-Level Findings in LLM Conversational Analysis May Be Illusory
arXiv:2604.14414v1 Announce Type: new Abstract: Turn-level metrics are widely used to evaluate properties of multi-turn human-LLM conversations, from safety and sycophancy to dialogue quality. However
- 6.0
The Cost of Language: Centroid Erasure Exposes and Exploits Modal Competition in Multimodal LLMs
arXiv:2604.14363v1 Announce Type: new Abstract: Multimodal language models systematically underperform on visual perception tasks, yet the structure underlying this failure remains poorly understood.
- 5.5
APEX-MEM: Agentic Semi-Structured Memory with Temporal Reasoning for Long-Term Context
arXiv:2604.14362v1 Announce Type: new Abstract: Large language models still struggle with reliable long-term conversational memory: simply enlarging context windows or applying naive retrieval often i
- 5.5
Faithfulness Serum: Mitigating the Faithfulness Gap in Textual Explanations of LLMs
arXiv:2604.14325v1 Announce Type: new Abstract: Large language models (LLMs) achieve strong performance and have revolutionized NLP, but their lack of explainability keeps them treated as black boxes,
- 5.5
Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Malicious Visual Attacks
arXiv:2604.13803v1 Announce Type: cross Abstract: Vision-language models are increasingly deployed in high-stakes settings, yet their susceptibility to sycophantic manipulation remains poorly understo
- 5.5
CROP: Token-Efficient Reasoning in Large Language Models via Regularized Prompt Optimization
arXiv:2604.14214v1 Announce Type: new Abstract: Large Language Models utilizing reasoning techniques improve task performance but incur significant latency and token costs due to verbose generation. E
- 6.0
SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment
arXiv:2604.13630v1 Announce Type: cross Abstract: The performance of large language model (LLM) agents depends critically on the execution harness, the system layer that orchestrates tool use, context
- 6.0
Internal Knowledge Without External Expression: Probing the Generalization Boundaries of Factual Knowledge in LLMs
arXiv:2604.14180v1 Announce Type: new Abstract: We train a 318M-parameter Transformer language model from scratch on a curated corpus of 1.56 billion tokens of pure Classical Chinese, with zero Englis
- 5.5