Posts

All the articles I've posted.

8.0
The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析
6.0
SocialGrid: A Benchmark for Planning and Social Reasoning in Embodied Multi-Agent Systems
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析
6.0
RAGognizer: Hallucination-Aware Fine-Tuning via Detection Head Integration
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析
6.0
FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析
7.0
Why Fine-Tuning Encourages Hallucinations and How to Fix It
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析
6.0
Ragged Paged Attention: A High-Performance and Flexible LLM Inference Kernel for TPU
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析
7.0
When the Loop Closes: Architectural Limits of In-Context Isolation, Metacognitive Co-option, and the Two-Target Design Problem in Human-LLM Systems
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析
9.0
Security Threat Modeling for Emerging AI-Agent Protocols: A Comparative Analysis of MCP, A2A, Agora, and ANP
2026年04月20日
· cs.AI updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析
6.0
Do LLMs Really Know What They Don't Know? Internal States Mainly Reflect Knowledge Recall Rather Than Truthfulness
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析
6.0
Pruning Unsafe Tickets: A Resource-Efficient Framework for Safer and More Robust LLMs
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析
7.0
Reasoning-targeted Jailbreak Attacks on Large Reasoning Models via Semantic Triggers and Psychological Framing
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析
6.0
Faster LLM Inference via Sequential Monte Carlo
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析
6.0
FineSteer: A Unified Framework for Fine-Grained Inference-Time Steering in Large Language Models
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析
6.0
Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析
7.0
${\pi}_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析
6.0
Do Vision-Language Models Truly Perform Vision Reasoning? A Rigorous Study of the Modality Gap
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析
6.0
Hallucination as Trajectory Commitment: Causal Evidence for Asymmetric Attractor Dynamics in Transformer Generation
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析
6.0
Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析
8.0
ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence
2026年04月20日
· cs.AI updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析
7.0
Qwen3.5-Omni Technical Report
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 17:04 采集
AI领域重要新闻和深度分析

Posts

The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination

SocialGrid: A Benchmark for Planning and Social Reasoning in Embodied Multi-Agent Systems

RAGognizer: Hallucination-Aware Fine-Tuning via Detection Head Integration

FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents

Why Fine-Tuning Encourages Hallucinations and How to Fix It

Ragged Paged Attention: A High-Performance and Flexible LLM Inference Kernel for TPU

When the Loop Closes: Architectural Limits of In-Context Isolation, Metacognitive Co-option, and the Two-Target Design Problem in Human-LLM Systems

Security Threat Modeling for Emerging AI-Agent Protocols: A Comparative Analysis of MCP, A2A, Agora, and ANP

Do LLMs Really Know What They Don't Know? Internal States Mainly Reflect Knowledge Recall Rather Than Truthfulness

Pruning Unsafe Tickets: A Resource-Efficient Framework for Safer and More Robust LLMs

Reasoning-targeted Jailbreak Attacks on Large Reasoning Models via Semantic Triggers and Psychological Framing

Faster LLM Inference via Sequential Monte Carlo

FineSteer: A Unified Framework for Fine-Grained Inference-Time Steering in Large Language Models

Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation

${\pi}_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities

Do Vision-Language Models Truly Perform Vision Reasoning? A Rigorous Study of the Modality Gap

Hallucination as Trajectory Commitment: Causal Evidence for Asymmetric Attractor Dynamics in Transformer Generation

Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents

ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence

Qwen3.5-Omni Technical Report