Posts

All the articles I've posted.

6.3
Improving the Throughput of Diffusion-based Large Language Models via a Training-Free Confidence-Aware Calibration
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 14:45 采集
arXiv:2512.07173v4 Announce Type: replace Abstract: We present CadLLM, a training-free method to accelerate the inference throughput of diffusion-based LLMs (dLLMs). We first investigate the dynamic n
6.3
Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 14:45 采集
arXiv:2509.25300v4 Announce Type: replace Abstract: While scaling laws for large language models (LLMs) during pre-training have been extensively studied, their behavior under reinforcement learning (
6.3
Online Distributionally Robust LLM Alignment via Regression to Relative Reward
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 14:45 采集
arXiv:2509.19104v2 Announce Type: replace Abstract: Reinforcement Learning with Human Feedback (RLHF) has become crucial for aligning Large Language Models (LLMs) with human intent. However, existing
6.3
UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards
2026年04月20日
· cs.AI updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.14967v2 Announce Type: replace-cross Abstract: Retrieval-Augmented Generation (RAG) extends Large Vision-Language Models (LVLMs) with external visual knowledge. However, existing visual RAG
6.3
Reward Modeling for Scientific Writing Evaluation
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 14:45 采集
arXiv:2601.11374v2 Announce Type: replace Abstract: Scientific writing is an expert-domain task that demands deep domain knowledge, task-specific requirements and reasoning capabilities that leverage
6.3
Joint-Centric Dual Contrastive Alignment with Structure-Preserving and Information-Balanced Regularization
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.16247v1 Announce Type: new Abstract: We propose HILBERT (HIerarchical Long-sequence Balanced Embedding with Reciprocal contrastive Training), a cross-attentive multimodal framework for lear
6.3
MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 14:45 采集
arXiv:2511.10262v3 Announce Type: replace Abstract: Full-Duplex Speech Language Models (FD-SLMs) enable real-time, overlapping conversational interactions, offering a more dynamic user experience comp
6.3
Understanding New-Knowledge-Induced Factual Hallucinations in LLMs: Analysis and Interpretation
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 14:45 采集
arXiv:2511.02626v3 Announce Type: replace Abstract: Prior works have shown that fine-tuning on new knowledge can induce factual hallucinations in large language models (LLMs), leading to incorrect out
6.3
JumpLoRA: Sparse Adapters for Continual Learning in Large Language Models
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.16171v1 Announce Type: new Abstract: Adapter-based methods have become a cost-effective approach to continual learning (CL) for Large Language Models (LLMs), by sequentially learning a low-
6.3
Wisdom is Knowing What not to Say: Hallucination-Free LLMs Unlearning via Attention Shifting
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 14:45 采集
arXiv:2510.17210v3 Announce Type: replace Abstract: The increase in computing power and the necessity of AI-assisted decision-making boost the growing application of large language models (LLMs). Alon
6.3
Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 14:45 采集
arXiv:2505.13792v2 Announce Type: replace Abstract: Recent advances in reasoning-focused Large Language Models (LLMs) have introduced Chain-of-Thought (CoT) traces - intermediate reasoning steps gener
6.3
Evaluating LLM Simulators as Differentially Private Data Generators
2026年04月20日
· cs.LG updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.15461v1 Announce Type: new Abstract: LLM-based simulators offer a promising path for generating complex synthetic data where traditional differentially private (DP) methods struggle with hi
6.3
Weak-Link Optimization for Multi-Agent Reasoning and Collaboration
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.15972v1 Announce Type: cross Abstract: LLM-driven multi-agent frameworks address complex reasoning tasks through multi-role collaboration. However, existing approaches often suffer from rea
6.3
A Case Study on the Impact of Anonymization Along the RAG Pipeline
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.15958v1 Announce Type: cross Abstract: Despite the considerable promise of Retrieval-Augmented Generation (RAG), many real-world use cases may create privacy concerns, where the purported u
6.3
Rethinking the Necessity of Adaptive Retrieval-Augmented Generation through the Lens of Adaptive Listwise Ranking
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.15621v1 Announce Type: cross Abstract: Adaptive Retrieval-Augmented Generation aims to mitigate the interference of extraneous noise by dynamically determining the necessity of retrieving s
6.3
A PennyLane-Centric Dataset to Enhance LLM-based Quantum Code Generation using RAG
2026年04月20日
· cs.AI updates on arXiv.org· 04/20 14:45 采集
arXiv:2503.02497v4 Announce Type: replace-cross Abstract: Large Language Models (LLMs) offer powerful capabilities in code generation, natural language understanding, and domain-specific reasoning. Th
6.3
BAGEL: Benchmarking Animal Knowledge Expertise in Language Models
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.16241v1 Announce Type: new Abstract: Large language models have shown strong performance on broad-domain knowledge and reasoning benchmarks, but it remains unclear how well language models
6.3
Optimizing Korean-Centric LLMs via Token Pruning
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.16235v1 Announce Type: new Abstract: This paper presents a systematic benchmark of state-of-the-art multilingual large language models (LLMs) adapted via token pruning - a compression techn
6.3
MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom Applications
2026年04月20日
· cs.AI updates on arXiv.org· 04/20 14:45 采集
arXiv:2511.13131v2 Announce Type: replace Abstract: Large Language Models (LLMs) have emerged as powerful tools for automating complex reasoning and decision-making tasks. In telecommunications, they
6.3
CoEvolve: Training LLM Agents via Agent-Data Mutual Evolution
2026年04月20日
· cs.CL updates on arXiv.org· 04/20 14:45 采集
arXiv:2604.15840v1 Announce Type: new Abstract: Reinforcement learning for LLM agents is typically conducted on a static data distribution, which fails to adapt to the agent's evolving behavior and le

Posts

Improving the Throughput of Diffusion-based Large Language Models via a Training-Free Confidence-Aware Calibration

Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning

Online Distributionally Robust LLM Alignment via Regression to Relative Reward

UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards

Reward Modeling for Scientific Writing Evaluation

Joint-Centric Dual Contrastive Alignment with Structure-Preserving and Information-Balanced Regularization

MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

Understanding New-Knowledge-Induced Factual Hallucinations in LLMs: Analysis and Interpretation

JumpLoRA: Sparse Adapters for Continual Learning in Large Language Models

Wisdom is Knowing What not to Say: Hallucination-Free LLMs Unlearning via Attention Shifting

Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation

Evaluating LLM Simulators as Differentially Private Data Generators

Weak-Link Optimization for Multi-Agent Reasoning and Collaboration

A Case Study on the Impact of Anonymization Along the RAG Pipeline

Rethinking the Necessity of Adaptive Retrieval-Augmented Generation through the Lens of Adaptive Listwise Ranking

A PennyLane-Centric Dataset to Enhance LLM-based Quantum Code Generation using RAG

BAGEL: Benchmarking Animal Knowledge Expertise in Language Models

Optimizing Korean-Centric LLMs via Token Pruning

MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom Applications

CoEvolve: Training LLM Agents via Agent-Data Mutual Evolution