Posts
All the articles I've posted.
- 7.7
The Illusion of Equivalence: Systematic FP16 Divergence in KV-Cached Autoregressive Inference
arXiv:2604.15409v1 Announce Type: new Abstract: KV caching is a ubiquitous optimization in autoregressive transformer inference, long presumed to be numerically equivalent to cache-free computation. T
- 7.7
VeriMoA: A Mixture-of-Agents Framework for Spec-to-HDL Generation
arXiv:2510.27617v2 Announce Type: replace Abstract: Automation of Register Transfer Level (RTL) design can help developers meet increasing computational demands. Large Language Models (LLMs) show prom
- 7.7
Why Fine-Tuning Encourages Hallucinations and How to Fix It
arXiv:2604.15574v1 Announce Type: cross Abstract: Large language models are prone to hallucinating factually incorrect statements. A key source of these errors is exposure to new factual information t
- 7.7
Language, Place, and Social Media: Geographic Dialect Alignment in New Zealand
arXiv:2604.15744v1 Announce Type: new Abstract: This thesis investigates geographic dialect alignment in place-informed social media communities, focussing on New Zealand-related Reddit communities. B
- 7.7
HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?
arXiv:2604.15415v1 Announce Type: cross Abstract: Large language models (LLMs) have evolved into autonomous agents that rely on open skill ecosystems (e.g., ClawHub and Skills.Rest), hosting numerous
- 7.3
CoMeT: Collaborative Memory Transformer for Efficient Long Context Modeling
arXiv:2602.01766v2 Announce Type: replace Abstract: The quadratic complexity and indefinitely growing key-value (KV) cache of standard Transformers pose a major barrier to long-context processing. To
- 7.3
In-Context Distillation with Self-Consistency Cascades: A Simple, Training-Free Way to Reduce LLM Agent Costs
arXiv:2512.02543v2 Announce Type: replace Abstract: Deploying LLM agents at scale typically requires choosing between quality and cost. Existing cost-reduction approaches fail to preserve agility: the
- 7.3
Dispatch-Aware Ragged Attention for Pruned Vision Transformers
arXiv:2604.15408v1 Announce Type: new Abstract: Token pruning methods for Vision Transformers (ViTs) promise quadratic reductions in attention FLOPs by dropping uninformative patches. Yet when pruned
- 7.3
Hallucination as Trajectory Commitment: Causal Evidence for Asymmetric Attractor Dynamics in Transformer Generation
arXiv:2604.15400v1 Announce Type: new Abstract: We present causal evidence that hallucination in autoregressive language models is an early trajectory commitment governed by asymmetric attractor dynam
- 7.3
ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence
arXiv:2603.24621v2 Announce Type: replace Abstract: We introduce ARC-AGI-3, an interactive benchmark for studying agentic intelligence through novel, abstract, turn-based environments in which agents
- 7.3
GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended Workflows
arXiv:2604.15715v1 Announce Type: new Abstract: The development of general-purpose agents requires a shift from executing simple instructions to completing complex, real-world productivity workflows.
- 7.0
FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents
arXiv:2602.01566v2 Announce Type: replace Abstract: Deep research is emerging as a representative long-horizon task for large language model (LLM) agents. However, long trajectories in deep research o
- 7.0
RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models
arXiv:2601.03699v2 Announce Type: replace Abstract: As large language models (LLMs) become integral to safety-critical applications, ensuring their robustness against adversarial prompts is paramount.
- 7.0
Do LLMs Really Know What They Don't Know? Internal States Mainly Reflect Knowledge Recall Rather Than Truthfulness
arXiv:2510.09033v3 Announce Type: replace Abstract: Recent work suggests that LLMs 「know what they don't know「, positing that hallucinated and factually correct outputs arise from distinct internal pr
- 7.0
Curing Miracle Steps in LLM Mathematical Reasoning with Rubric Rewards
arXiv:2510.07774v3 Announce Type: replace Abstract: In this paper, we observe that current models are susceptible to reward hacking, leading to a substantial overestimation of a model's reasoning abil
- 7.0
Faster LLM Inference via Sequential Monte Carlo
arXiv:2604.15672v1 Announce Type: new Abstract: Speculative decoding (SD) accelerates language model inference by drafting tokens from a cheap proposal model and verifying them against an expensive ta
- 7.0
Aletheia: Gradient-Guided Layer Selection for Efficient LoRA Fine-Tuning Across Architectures
arXiv:2604.15351v1 Announce Type: new Abstract: Low-Rank Adaptation (LoRA) has become the dominant parameter-efficient fine-tuning method for large language models, yet standard practice applies LoRA
- 7.0
AgentV-RL: Scaling Reward Modeling with Agentic Verifier
arXiv:2604.16004v1 Announce Type: new Abstract: Verifiers have been demonstrated to enhance LLM reasoning via test-time scaling (TTS). Yet, they face significant challenges in complex domains. Error p
- 7.0
Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility
arXiv:2604.15579v1 Announce Type: cross Abstract: AI agents that interact with their environments through tools enable powerful applications, but in high-stakes business settings, unintended actions c
- 6.7
OjaKV: Context-Aware Online Low-Rank KV Cache Compression
arXiv:2509.21623v2 Announce Type: replace-cross Abstract: The expanding long-context capabilities of large language models are constrained by a significant memory bottleneck: the key-value (KV) cache