Posts

All the articles I've posted.

6.0
ParetoSlider: Diffusion Models Post-Training for Continuous Reward Control
2026年04月23日
· arXiv· 04/23 08:00 采集
6.0
Differentiable Conformal Training for LLM Reasoning Factuality
2026年04月23日
· arXiv· 04/23 08:00 采集
6.0
Hidden Reliability Risks in Large Language Models: Systematic Identification of Precision-Induced Output Disagreements
2026年04月23日
· arXiv· 04/23 08:00 采集
6.5
SkillGraph: Graph Foundation Priors for LLM Agent Tool Sequence Recommendation
2026年04月23日
· arXiv· 04/23 08:00 采集
6.0
Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation
2026年04月23日
· arXiv· 04/23 08:00 采集
6.5
ChipCraftBrain: Validation-First RTL Generation via Multi-Agent Orchestration
2026年04月23日
· arXiv· 04/23 08:00 采集
7.0
From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization
2026年04月23日
· arXiv· 04/23 08:00 采集
6.5
Separable Pathways for Causal Reasoning: How Architectural Scaffolding Enables Hypothesis-Space Restructuring in LLM Agents
2026年04月23日
· arXiv· 04/23 08:00 采集
6.0
Bootstrapping Post-training Signals for Open-ended Tasks via Rubric-based Self-play on Pre-training Text
2026年04月23日
· arXiv· 04/23 08:00 采集
6.0
SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks
2026年04月23日
· arXiv· 04/23 08:00 采集
6.5
HiPO: Hierarchical Preference Optimization for Adaptive Reasoning in LLMs
2026年04月23日
· arXiv· 04/23 08:00 采集
6.5
WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning
2026年04月23日
· arXiv· 04/23 08:00 采集
7.0
Self-Aware Vector Embeddings for Retrieval-Augmented Generation: A Neuroscience-Inspired Framework for Temporal, Confidence-Weighted, and Relational Knowledge
2026年04月23日
· arXiv· 04/23 08:00 采集
6.0
Coverage, Not Averages: Semantic Stratification for Trustworthy Retrieval Evaluation
2026年04月23日
· arXiv· 04/23 08:00 采集
7.0
DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data
2026年04月23日
· arXiv· 04/23 08:00 采集
6.5
MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design
2026年04月23日
· arXiv· 04/23 08:00 采集
6.0
Kalman Filter Enhanced GRPO for Reinforcement Learning-Based Language Model Reasoning
2026年04月23日
· arXiv· 04/23 08:00 采集
6.0
The Ratchet Effect in Silico through Interaction-Driven Cumulative Intelligence in Large Language Models
2026年04月23日
· arXiv· 04/23 08:00 采集
6.0
From Competition to Synergy: Unlocking Reinforcement Learning for Subject-Driven Image Generation
2026年04月23日
· arXiv· 04/23 08:00 采集
7.0
BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching
2026年04月23日
· arXiv· 04/23 08:00 采集

Posts

ParetoSlider: Diffusion Models Post-Training for Continuous Reward Control

Differentiable Conformal Training for LLM Reasoning Factuality

Hidden Reliability Risks in Large Language Models: Systematic Identification of Precision-Induced Output Disagreements

SkillGraph: Graph Foundation Priors for LLM Agent Tool Sequence Recommendation

Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation

ChipCraftBrain: Validation-First RTL Generation via Multi-Agent Orchestration

From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization

Separable Pathways for Causal Reasoning: How Architectural Scaffolding Enables Hypothesis-Space Restructuring in LLM Agents

Bootstrapping Post-training Signals for Open-ended Tasks via Rubric-based Self-play on Pre-training Text

SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks

HiPO: Hierarchical Preference Optimization for Adaptive Reasoning in LLMs

WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning

Self-Aware Vector Embeddings for Retrieval-Augmented Generation: A Neuroscience-Inspired Framework for Temporal, Confidence-Weighted, and Relational Knowledge

Coverage, Not Averages: Semantic Stratification for Trustworthy Retrieval Evaluation

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design

Kalman Filter Enhanced GRPO for Reinforcement Learning-Based Language Model Reasoning

The Ratchet Effect in Silico through Interaction-Driven Cumulative Intelligence in Large Language Models

From Competition to Synergy: Unlocking Reinforcement Learning for Subject-Driven Image Generation

BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching