Posts
All the articles I've posted.
- 5.0
- 6.0
How robots learn: A brief, contemporary history
· 04/17 20:45 采集 - 5.5
llm-anthropic 0.25
· 04/17 12:31 采集 - 6.0
Exploration and Exploitation Errors Are Measurable for Language Model Agents
arXiv:2604.13151v1 Announce Type: new Abstract: Language Model (LM) agents are increasingly used in complex open-ended decision-making tasks, from AI coding to physical AI. A core requirement in these
- 5.5
SciFi: A Safe, Lightweight, User-Friendly, and Fully Autonomous Agentic AI Workflow for Scientific Applications
arXiv:2604.13180v1 Announce Type: new Abstract: Recent advances in agentic AI have enabled increasingly autonomous workflows, but existing systems still face substantial challenges in achieving reliab
- 7.0
Towards Bridging the Reward-Generation Gap in Direct Alignment Algorithms
arXiv:2506.09457v3 Announce Type: replace-cross Abstract: Direct Alignment Algorithms (DAAs), such as Direct Preference Optimization (DPO) and Simple Preference Optimization (SimPO), have emerged as e
- 5.5
Threshold Differential Attention for Sink-Free, Ultra-Sparse, and Non-Dispersive Language Modeling
arXiv:2601.12145v2 Announce Type: replace Abstract: Softmax attention struggles with long contexts due to structural limitations: the strict sum-to-one constraint forces attention sinks on irrelevant
- 6.0
ORBIT: On-policy Exploration-Exploitation for Controllable Multi-Budget Reasoning
arXiv:2601.08310v2 Announce Type: replace Abstract: Recent Large Reasoning Models (LRMs) achieve strong performance by leveraging long-form Chain-of-Thought (CoT) reasoning, but uniformly applying ove
- 5.5
Cornfigurator: Automated Planning for Any-to-Any Multimodal Model Serving
arXiv:2512.14098v3 Announce Type: replace Abstract: Any-to-Any models are an emerging class of multimodal models that accept combinations of text and multimodal data as input and generate them as outp
- 6.5
Enabling Agents to Communicate Entirely in Latent Space
arXiv:2511.09149v4 Announce Type: replace Abstract: While natural language is the de facto communication medium for LLM-based agents, it presents a fundamental constraint. The process of downsampling
- 7.5
AutoRAN: Automated Hijacking of Safety Reasoning in Large Reasoning Models
arXiv:2505.10846v3 Announce Type: replace Abstract: This paper presents AutoRAN, the first framework to automate the hijacking of internal safety reasoning in large reasoning models (LRMs). At its cor
- 6.0
Generalization in LLM Problem Solving: The Case of the Shortest Path
arXiv:2604.15306v1 Announce Type: cross Abstract: Whether language models can systematically generalize remains actively debated. Yet empirical performance is jointly shaped by multiple factors such a
- 6.0
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding
arXiv:2601.14724v3 Announce Type: replace-cross Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated significant improvement in offline video understanding. Howe
- 7.0
Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations
arXiv:2604.15302v1 Announce Type: cross Abstract: LLM-as-judge frameworks are increasingly used for automatic NLG evaluation, yet their per-instance reliability remains poorly understood. We present a
- 6.0
TopoDIM: One-shot Topology Generation of Diverse Interaction Modes for Multi-Agent Systems
arXiv:2601.10120v2 Announce Type: replace-cross Abstract: Optimizing communication topology in LLM-based multi-agent system is critical for enabling collective intelligence. Existing methods mainly re
- 5.5
IUQ: Interrogative Uncertainty Quantification for Long-Form Large Language Model Generation
arXiv:2604.15109v1 Announce Type: cross Abstract: Despite the rapid advancement of Large Language Models (LLMs), uncertainty quantification in LLM generation is a persistent challenge. Although recent
- 6.5
Atropos: Improving Cost-Benefit Trade-off of LLM-based Agents under Self-Consistency with Early Termination and Model Hotswap
arXiv:2604.15075v1 Announce Type: cross Abstract: Open-weight Small Language Models(SLMs) can provide faster local inference at lower financial cost, but may not achieve the same performance level as
- 7.0
Route to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix Optimization
arXiv:2604.15022v1 Announce Type: cross Abstract: Cost-aware routing dynamically dispatches user queries to models of varying capability to balance performance and inference cost. However, the routing
- 6.5
Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models
arXiv:2604.14888v1 Announce Type: cross Abstract: Recent advances in vision language models (VLMs) offer reasoning capabilities, yet how these unfold and integrate visual and textual information remai
- 6.5
World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems
arXiv:2604.14732v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models have emerged as a promising paradigm for building embodied agents that ground perception and language into action.