Posts

All the articles I've posted.

5.0
The Download: bad news for inner Neanderthals, and AI warfare's human illusion
2026年04月17日
· 04/17 20:45 采集
6.0
How robots learn: A brief, contemporary history
2026年04月17日
· 04/17 20:45 采集
5.5
llm-anthropic 0.25
2026年04月17日
· 04/17 12:31 采集
6.0
Exploration and Exploitation Errors Are Measurable for Language Model Agents
2026年04月17日
· cs.AI updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.13151v1 Announce Type: new Abstract: Language Model (LM) agents are increasingly used in complex open-ended decision-making tasks, from AI coding to physical AI. A core requirement in these
5.5
SciFi: A Safe, Lightweight, User-Friendly, and Fully Autonomous Agentic AI Workflow for Scientific Applications
2026年04月17日
· cs.AI updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.13180v1 Announce Type: new Abstract: Recent advances in agentic AI have enabled increasingly autonomous workflows, but existing systems still face substantial challenges in achieving reliab
7.0
Towards Bridging the Reward-Generation Gap in Direct Alignment Algorithms
2026年04月17日
· cs.LG updates on arXiv.org· 04/17 12:31 采集
arXiv:2506.09457v3 Announce Type: replace-cross Abstract: Direct Alignment Algorithms (DAAs), such as Direct Preference Optimization (DPO) and Simple Preference Optimization (SimPO), have emerged as e
5.5
Threshold Differential Attention for Sink-Free, Ultra-Sparse, and Non-Dispersive Language Modeling
2026年04月17日
· cs.LG updates on arXiv.org· 04/17 12:31 采集
arXiv:2601.12145v2 Announce Type: replace Abstract: Softmax attention struggles with long contexts due to structural limitations: the strict sum-to-one constraint forces attention sinks on irrelevant
6.0
ORBIT: On-policy Exploration-Exploitation for Controllable Multi-Budget Reasoning
2026年04月17日
· cs.LG updates on arXiv.org· 04/17 12:31 采集
arXiv:2601.08310v2 Announce Type: replace Abstract: Recent Large Reasoning Models (LRMs) achieve strong performance by leveraging long-form Chain-of-Thought (CoT) reasoning, but uniformly applying ove
5.5
Cornfigurator: Automated Planning for Any-to-Any Multimodal Model Serving
2026年04月17日
· cs.LG updates on arXiv.org· 04/17 12:31 采集
arXiv:2512.14098v3 Announce Type: replace Abstract: Any-to-Any models are an emerging class of multimodal models that accept combinations of text and multimodal data as input and generate them as outp
6.5
Enabling Agents to Communicate Entirely in Latent Space
2026年04月17日
· cs.LG updates on arXiv.org· 04/17 12:31 采集
arXiv:2511.09149v4 Announce Type: replace Abstract: While natural language is the de facto communication medium for LLM-based agents, it presents a fundamental constraint. The process of downsampling
7.5
AutoRAN: Automated Hijacking of Safety Reasoning in Large Reasoning Models
2026年04月17日
· cs.LG updates on arXiv.org· 04/17 12:31 采集
arXiv:2505.10846v3 Announce Type: replace Abstract: This paper presents AutoRAN, the first framework to automate the hijacking of internal safety reasoning in large reasoning models (LRMs). At its cor
6.0
Generalization in LLM Problem Solving: The Case of the Shortest Path
2026年04月17日
· cs.LG updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.15306v1 Announce Type: cross Abstract: Whether language models can systematically generalize remains actively debated. Yet empirical performance is jointly shaped by multiple factors such a
6.0
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding
2026年04月17日
· cs.CL updates on arXiv.org· 04/17 12:31 采集
arXiv:2601.14724v3 Announce Type: replace-cross Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated significant improvement in offline video understanding. Howe
7.0
Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations
2026年04月17日
· cs.LG updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.15302v1 Announce Type: cross Abstract: LLM-as-judge frameworks are increasingly used for automatic NLG evaluation, yet their per-instance reliability remains poorly understood. We present a
6.0
TopoDIM: One-shot Topology Generation of Diverse Interaction Modes for Multi-Agent Systems
2026年04月17日
· cs.CL updates on arXiv.org· 04/17 12:31 采集
arXiv:2601.10120v2 Announce Type: replace-cross Abstract: Optimizing communication topology in LLM-based multi-agent system is critical for enabling collective intelligence. Existing methods mainly re
5.5
IUQ: Interrogative Uncertainty Quantification for Long-Form Large Language Model Generation
2026年04月17日
· cs.LG updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.15109v1 Announce Type: cross Abstract: Despite the rapid advancement of Large Language Models (LLMs), uncertainty quantification in LLM generation is a persistent challenge. Although recent
6.5
Atropos: Improving Cost-Benefit Trade-off of LLM-based Agents under Self-Consistency with Early Termination and Model Hotswap
2026年04月17日
· cs.LG updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.15075v1 Announce Type: cross Abstract: Open-weight Small Language Models(SLMs) can provide faster local inference at lower financial cost, but may not achieve the same performance level as
7.0
Route to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix Optimization
2026年04月17日
· cs.LG updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.15022v1 Announce Type: cross Abstract: Cost-aware routing dynamically dispatches user queries to models of varying capability to balance performance and inference cost. However, the routing
6.5
Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models
2026年04月17日
· cs.LG updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.14888v1 Announce Type: cross Abstract: Recent advances in vision language models (VLMs) offer reasoning capabilities, yet how these unfold and integrate visual and textual information remai
6.5
World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems
2026年04月17日
· cs.LG updates on arXiv.org· 04/17 12:31 采集
arXiv:2604.14732v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models have emerged as a promising paradigm for building embodied agents that ground perception and language into action.

Posts

The Download: bad news for inner Neanderthals, and AI warfare's human illusion

How robots learn: A brief, contemporary history

llm-anthropic 0.25

Exploration and Exploitation Errors Are Measurable for Language Model Agents

SciFi: A Safe, Lightweight, User-Friendly, and Fully Autonomous Agentic AI Workflow for Scientific Applications

Towards Bridging the Reward-Generation Gap in Direct Alignment Algorithms

Threshold Differential Attention for Sink-Free, Ultra-Sparse, and Non-Dispersive Language Modeling

ORBIT: On-policy Exploration-Exploitation for Controllable Multi-Budget Reasoning

Cornfigurator: Automated Planning for Any-to-Any Multimodal Model Serving

Enabling Agents to Communicate Entirely in Latent Space

AutoRAN: Automated Hijacking of Safety Reasoning in Large Reasoning Models

Generalization in LLM Problem Solving: The Case of the Shortest Path

HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations

TopoDIM: One-shot Topology Generation of Diverse Interaction Modes for Multi-Agent Systems

IUQ: Interrogative Uncertainty Quantification for Long-Form Large Language Model Generation

Atropos: Improving Cost-Benefit Trade-off of LLM-based Agents under Self-Consistency with Early Termination and Model Hotswap

Route to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix Optimization

Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models

World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems