Posts

All the articles I've posted.

3.2
MoDora: Tree-Based Semi-Structured Document Analysis System
2026年04月15日
· cs.LG updates on arXiv.org· 04/15 12:35 采集
arXiv:2602.23061v3 Announce Type: replace-cross Abstract: Semi-structured documents integrate diverse interleaved data elements (e.g., tables, charts, hierarchical paragraphs) arranged in various and
3.2
League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models
2026年04月15日
· cs.CL updates on arXiv.org· 04/15 12:35 采集
arXiv:2507.22359v4 Announce Type: replace-cross Abstract: Although large language models (LLMs) have shown exceptional capabilities across a wide range of tasks, reliable evaluation remains a critical
3.2
Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain
2026年04月15日
· cs.LG updates on arXiv.org· 04/15 12:35 采集
arXiv:2510.05159v4 Announce Type: replace-cross Abstract: While finetuning AI agents on interaction data -- such as web browsing or tool use -- improves their capabilities, it also introduces critical
3.2
DRPG (Decompose, Retrieve, Plan, Generate): An Agentic Framework for Academic Rebuttal
2026年04月15日
· cs.LG updates on arXiv.org· 04/15 12:35 采集
arXiv:2601.18081v2 Announce Type: replace Abstract: Despite the growing adoption of large language models (LLMs) in scientific research workflows, automated support for academic rebuttal, a crucial st
3.2
INFORM-CT: INtegrating LLMs and VLMs FOR Incidental Findings Management in Abdominal CT
2026年04月15日
· cs.LG updates on arXiv.org· 04/15 12:35 采集
arXiv:2512.14732v2 Announce Type: replace Abstract: Incidental findings in CT scans, though often benign, can have significant clinical implications and should be reported following established guidel
3.2
Hear Both Sides: Efficient Multi-Agent Debate via Diversity-Aware Message Retention
2026年04月15日
· cs.CL updates on arXiv.org· 04/15 12:35 采集
arXiv:2603.20640v2 Announce Type: replace Abstract: Multi-Agent Debate has emerged as a promising framework for improving the reasoning quality of large language models through iterative inter-agent c
3.2
Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following
2026年04月15日
· cs.CL updates on arXiv.org· 04/15 12:35 采集
arXiv:2510.14420v4 Announce Type: replace Abstract: Language models often struggle to follow multi-constraint instructions that are crucial for real-world applications. Existing reinforcement learning
3.2
All in One: A Unified Synthetic Data Pipeline for Multimodal Video Understanding
2026年04月15日
· cs.LG updates on arXiv.org· 04/15 12:35 采集
arXiv:2604.12335v1 Announce Type: cross Abstract: Training multimodal large language models (MLLMs) for video understanding requires large-scale annotated data spanning diverse tasks such as object co
3.2
PipeLive: Efficient Live In-place Pipeline Parallelism Reconfiguration for Dynamic LLM Serving
2026年04月15日
· cs.LG updates on arXiv.org· 04/15 12:35 采集
arXiv:2604.12171v1 Announce Type: cross Abstract: Pipeline parallelism (PP) is widely used to partition layers of large language models (LLMs) across GPUs, enabling scalable inference for large models
3.2
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
2026年04月15日
· cs.CL updates on arXiv.org· 04/15 12:35 采集
arXiv:2604.13016v1 Announce Type: cross Abstract: On-policy distillation (OPD) has become a core technique in the post-training of large language models, yet its training dynamics remain poorly unders
3.2
OSC: Hardware Efficient W4A4 Quantization via Outlier Separation in Channel Dimension
2026年04月15日
· cs.LG updates on arXiv.org· 04/15 12:35 采集
arXiv:2604.12782v1 Announce Type: new Abstract: While 4-bit quantization is essential for high-throughput deployment of Large Language Models, activation outliers often lead to significant accuracy de
3.2
From Plan to Action: How Well Do Agents Follow the Plan?
2026年04月15日
· cs.CL updates on arXiv.org· 04/15 12:35 采集
arXiv:2604.12147v1 Announce Type: cross Abstract: Agents aspire to eliminate the need for task-specific prompt crafting through autonomous reason-act-observe loops. Still, they are commonly instructed
3.2
PolicyLLM: Towards Excellent Comprehension of Public Policy for Large Language Models
2026年04月15日
· cs.CL updates on arXiv.org· 04/15 12:35 采集
arXiv:2604.12995v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly integrated into real-world decision-making, including in the domain of public policy. Yet, their ability t
3.2
StableSketcher: Enhancing Diffusion Model for Pixel-based Sketch Generation via Visual Question Answering Feedback
2026年04月15日
· cs.AI updates on arXiv.org· 04/15 12:35 采集
arXiv:2510.20093v2 Announce Type: replace-cross Abstract: Although recent advancements in diffusion models have significantly enriched the quality of generated images, challenges remain in synthesizin
3.2
Growing Pains: Extensible and Efficient LLM Benchmarking Via Fixed Parameter Calibration
2026年04月15日
· cs.CL updates on arXiv.org· 04/15 12:35 采集
arXiv:2604.12843v1 Announce Type: new Abstract: The rapid release of both language models and benchmarks makes it increasingly costly to evaluate every model on every dataset. In practice, models are
3.2
LLM-Enhanced Log Anomaly Detection: A Comprehensive Benchmark of Large Language Models for Automated System Diagnostics
2026年04月15日
· cs.LG updates on arXiv.org· 04/15 12:35 采集
arXiv:2604.12218v1 Announce Type: new Abstract: System log anomaly detection is critical for maintaining the reliability of large-scale software systems, yet traditional methods struggle with the hete
3.2
InsightFlow: LLM-Driven Synthesis of Patient Narratives for Mental Health into Causal Models
2026年04月15日
· cs.CL updates on arXiv.org· 04/15 12:35 采集
arXiv:2604.12721v1 Announce Type: new Abstract: Clinical case formulation organizes patient symptoms and psychosocial factors into causal models, often using the 5P framework. However, constructing su
3.2
Topology-Aware Reasoning over Incomplete Knowledge Graph with Graph-Based Soft Prompting
2026年04月15日
· cs.CL updates on arXiv.org· 04/15 12:35 采集
arXiv:2604.12503v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown remarkable capabilities across various tasks but remain prone to hallucinations in knowledge-intensive scenarios
3.2
Calibrated Confidence Estimation for Tabular Question Answering
2026年04月15日
· cs.CL updates on arXiv.org· 04/15 12:35 采集
arXiv:2604.12491v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed for tabular question answering, yet calibration on structured data is largely unstudied. This pap
3.2
Sample Complexity of Autoregressive Reasoning: Chain-of-Thought vs. End-to-End
2026年04月15日
· cs.LG updates on arXiv.org· 04/15 12:35 采集
arXiv:2604.12013v1 Announce Type: new Abstract: Modern large language models generate text autoregressively, producing tokens one at a time. To study the learnability of such systems, Joshi et al. (CO

Posts

MoDora: Tree-Based Semi-Structured Document Analysis System

League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models

Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain

DRPG (Decompose, Retrieve, Plan, Generate): An Agentic Framework for Academic Rebuttal

INFORM-CT: INtegrating LLMs and VLMs FOR Incidental Findings Management in Abdominal CT

Hear Both Sides: Efficient Multi-Agent Debate via Diversity-Aware Message Retention

Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following

All in One: A Unified Synthetic Data Pipeline for Multimodal Video Understanding

PipeLive: Efficient Live In-place Pipeline Parallelism Reconfiguration for Dynamic LLM Serving

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

OSC: Hardware Efficient W4A4 Quantization via Outlier Separation in Channel Dimension

From Plan to Action: How Well Do Agents Follow the Plan?

PolicyLLM: Towards Excellent Comprehension of Public Policy for Large Language Models

StableSketcher: Enhancing Diffusion Model for Pixel-based Sketch Generation via Visual Question Answering Feedback

Growing Pains: Extensible and Efficient LLM Benchmarking Via Fixed Parameter Calibration

LLM-Enhanced Log Anomaly Detection: A Comprehensive Benchmark of Large Language Models for Automated System Diagnostics

InsightFlow: LLM-Driven Synthesis of Patient Narratives for Mental Health into Causal Models

Topology-Aware Reasoning over Incomplete Knowledge Graph with Graph-Based Soft Prompting

Calibrated Confidence Estimation for Tabular Question Answering

Sample Complexity of Autoregressive Reasoning: Chain-of-Thought vs. End-to-End