Posts
All the articles I've posted.
- 6.7
Protecting Language Models Against Unauthorized Distillation through Trace Rewriting
arXiv:2602.15143v2 Announce Type: replace-cross Abstract: Knowledge distillation is a widely adopted technique for transferring capabilities from LLMs to smaller, more efficient student models. Howeve
- 6.7
EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems
arXiv:2510.13220v2 Announce Type: replace-cross Abstract: A fundamental limitation of current AI agents is their inability to learn complex skills on the fly at test time, often behaving like 「clever
- 6.7
Revisiting Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning
arXiv:2510.10959v3 Announce Type: replace-cross Abstract: Reasoning ability has become a defining capability of Large Language Models (LLMs), with Reinforcement Learning with Verifiable Rewards (RLVR)
- 6.7
Dynamic Tool Dependency Retrieval for Lightweight Function Calling
arXiv:2512.17052v4 Announce Type: replace Abstract: Function calling agents powered by Large Language Models (LLMs) select external tools to automate complex tasks. On-device agents typically use a re
- 6.7
Information Router for Mitigating Modality Dominance in Vision-Language Models
arXiv:2604.16264v1 Announce Type: cross Abstract: Vision Language models (VLMs) have demonstrated strong performance across a wide range of benchmarks, yet they often suffer from modality dominance, w
- 6.7
Structured Abductive-Deductive-Inductive Reasoning for LLMs via Algebraic Invariants
arXiv:2604.15727v1 Announce Type: cross Abstract: Large language models exhibit systematic limitations in structured logical reasoning: they conflate hypothesis generation with verification, cannot di
- 6.7
Ragged Paged Attention: A High-Performance and Flexible LLM Inference Kernel for TPU
arXiv:2604.15464v1 Announce Type: cross Abstract: Large Language Model (LLM) deployment is increasingly shifting to cost-efficient accelerators like Google's Tensor Processing Units (TPUs), prioritizi
- 6.7
Aerial Multi-Functional RIS in Fluid Antennas-Aided Full-Duplex Networks: A Self-Optimized Hybrid Deep Reinforcement Learning Approach
arXiv:2604.14309v2 Announce Type: replace-cross Abstract: To address high data traffic demands of sixth-generation (6G) networks, this paper proposes a novel architecture that integrates autonomous ae
- 6.7
EVIL: Evolving Interpretable Algorithms for Zero-Shot Inference on Event Sequences and Time Series with LLMs
arXiv:2604.15787v1 Announce Type: new Abstract: We introduce EVIL (\textbf{EV}olving \textbf{I}nterpretable algorithms with \textbf{L}LMs), an approach that uses LLM-guided evolutionary search to disc
- 6.7
Pruning Unsafe Tickets: A Resource-Efficient Framework for Safer and More Robust LLMs
arXiv:2604.15780v1 Announce Type: new Abstract: Machine learning models are increasingly deployed in real-world applications, but even aligned models such as Mistral and LLaVA still exhibit unsafe beh
- 6.7
Reasoning-targeted Jailbreak Attacks on Large Reasoning Models via Semantic Triggers and Psychological Framing
arXiv:2604.15725v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) have demonstrated strong capabilities in generating step-by-step reasoning chains alongside final answers, enabling their
- 6.7
Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation
arXiv:2505.21072v4 Announce Type: replace Abstract: Large Language Models (LLMs) enhanced with retrieval, an approach known as Retrieval-Augmented Generation (RAG), have achieved strong performance in
- 6.7
FineSteer: A Unified Framework for Fine-Grained Inference-Time Steering in Large Language Models
arXiv:2604.15488v1 Announce Type: new Abstract: Large language models (LLMs) often exhibit undesirable behaviors, such as safety violations and hallucinations. Although inference-time steering offers
- 6.7
Do Vision-Language Models Truly Perform Vision Reasoning? A Rigorous Study of the Modality Gap
arXiv:2604.16256v1 Announce Type: cross Abstract: Reasoning in vision-language models (VLMs) has recently attracted significant attention due to its broad applicability across diverse downstream tasks
- 6.7
Discover and Prove: An Open-source Agentic Framework for Hard Mode Automated Theorem Proving in Lean 4
arXiv:2604.15839v1 Announce Type: cross Abstract: Most ATP benchmarks embed the final answer within the formal statement -- a convention we call 「Easy Mode「 -- a design that simplifies the task relati
- 6.7
From Multi-Agent to Single-Agent: When Is Skill Distillation Beneficial?
arXiv:2604.01608v2 Announce Type: replace Abstract: Multi-agent systems (MAS) tackle complex tasks by distributing expertise, though this often comes at the cost of heavy coordination overhead, contex
- 6.7
Chain-of-Thought Degrades Visual Spatial Reasoning Capabilities of Multimodal LLMs
arXiv:2604.16060v1 Announce Type: cross Abstract: Multimodal Reasoning Models (MRMs) leveraging Chain-of-Thought (CoT) based thinking have revolutionized mathematical and logical problem-solving. Howe
- 6.7
The Semi-Executable Stack: Agentic Software Engineering and the Expanding Scope of SE
arXiv:2604.15468v1 Announce Type: cross Abstract: AI-based systems, currently driven largely by LLMs and tool-using agentic harnesses, are increasingly discussed as a possible threat to software engin
- 6.3
Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations
arXiv:2603.03332v3 Announce Type: replace-cross Abstract: Chain-of-Thought (CoT) prompting has emerged as a foundational technique for eliciting reasoning from Large Language Models (LLMs), yet the ro
- 6.3
EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis
arXiv:2601.05808v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are expected to be trained to act as agents in various real-world environments, but this process relies on rich a