工程实践

249 篇文章

7.0
Why Search When You Can Transfer? Amortized Agentic Workflow Design from Structural Priors
2026年04月29日
· arXiv cs.LG· 04/29 14:31 采集
SWIFT 通过结构先验的少样本迁移摊销 agentic 工作流设计，替代逐任务组合搜索
7.0
Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses
2026年04月29日
· arXiv cs.CL· 04/29 14:31 采集
AHE 框架通过 observability 驱动自动进化 coding agent harness，解决异构动作空间和稀疏信号难题
5.5
From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling
2026年04月29日
· arXiv cs.LG· 04/29 14:31 采集
Agora-Opt 模块化 agent 框架结合去中心化辩论与读写记忆银行用于优化建模
6.0
PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference
2026年04月29日
· arXiv cs.LG· 04/29 14:31 采集
多个并发推理 agent 共享单一非对称压缩 KV cache 池，Keys int8 量化 Values int4 量化
5.0
Cheaper, Better, Faster, Stronger: Robust Text-to-SQL without Chain-of-Thought or Fine-Tuning
2026年04月29日
· arXiv cs.LG· 04/29 14:31 采集
无需 CoT 或微调的鲁棒 text-to-SQL 方法，平均成本远低于现有 SOTA 方法
5.5
Cornserve: A Distributed Serving System for Any-to-Any Multimodal Models
2026年04月29日
· arXiv cs.LG· 04/29 14:31 采集
Any-to-Any 多模态模型分布式 serving 系统，处理不同请求走不同计算图路径的挑战
5.0
Faithfulness-QA: A Counterfactual Entity Substitution Dataset for Training Context-Faithful RAG Models
2026年04月29日
· arXiv cs.CL· 04/29 14:31 采集
99,094 样本的反事实实体替换数据集，训练偏好上下文而非参数记忆的 RAG 模型
4.5
WhisperPipe: A Resource-Efficient Streaming Architecture for Real-Time ASR
2026年04月29日
· arXiv cs.CL· 04/29 14:31 采集
WhisperPipe 实现有界内存的实时流式 ASR，在转录准确率和计算效率间取得更好权衡
5.0
Learning Illumination Control in Diffusion Models
2026年04月29日
· arXiv cs.LG· 04/29 14:31 采集
完全开源可复现的扩散模型光照控制 pipeline，无需深度图等重度控制输入
6.0
BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate
2026年04月29日
· arXiv cs.LG· 04/29 14:31 采集
通过非对称辩论合成训练数据来训练自定义策略护栏分类器，减少昂贵标注需求
5.0
MobileLLM-Flash: Latency-Guided On-Device LLM Design for Industry Scale Deployment
2026年04月29日
· arXiv cs.LG· 04/29 14:31 采集
硬件在环架构搜索设计面向移动延迟约束的端侧 LLM 方法论
5.0
VOYAGER: A Training Free Approach for Generating Diverse Datasets using LLMs
2026年04月29日
· arXiv cs.LG· 04/29 14:31 采集
Voyager 无训练方法用 LLM 生成多样化数据集，迭代优化数学多样性指标
5.0
From Local to Global: Revisiting Structured Pruning Paradigms for LLMs
2026年04月29日
· arXiv cs.LG· 04/29 14:31 采集
重新审视 LLM 结构化剪枝：从局部重建到全局任务目标
5.0
QFlash: Bridging Quantization and Memory Efficiency in Vision Transformer Attention
2026年04月29日
· arXiv cs.LG· 04/29 14:31 采集
QFlash 解决 ViT FlashAttention 全整数化的三个主要障碍
4.5
ARQ: A Mixed-Precision Quantization Framework for Accurate and Certifiably Robust DNNs
2026年04月29日
· arXiv cs.LG· 04/29 14:31 采集
ARQ 混合精度量化方法同时优化执行效率和可认证鲁棒性
5.5
Doing More With Less: Revisiting the Effectiveness of LLM Pruning for Test-Time Scaling
2026年04月29日
· arXiv cs.LG· 04/29 14:31 采集
重新审视 LLM 剪枝对 test-time scaling 的有效性，发现结构化剪枝显著损害推理能力
5.0
Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models
2026年04月29日
· arXiv cs.LG· 04/29 14:31 采集
绿色压缩流水线：从碳足迹角度审视 LLM 压缩策略
4.5
Feasible-First Exploration for Constrained ML Deployment Optimization
2026年04月29日
· arXiv cs.LG· 04/29 14:31 采集
可行优先探索用于崩溃易发的层级混合变量搜索空间中的约束 ML 部署优化
4.5
FED-FSTQ: Fisher-Guided Token Quantization for Communication-Efficient Federated Fine-Tuning of LLMs on Edge Devices
2026年04月29日
· arXiv cs.LG· 04/29 14:31 采集
Fisher 信息引导的 token 量化用于边缘设备通信高效联邦 LLM 微调
4.5
Backtranslation Augmented Direct Preference Optimization for Neural Machine Translation
2026年04月29日
· arXiv cs.CL· 04/29 14:31 采集
回译增强 DPO 用于 NMT 后训练，无需平行语料

工程实践

Why Search When You Can Transfer? Amortized Agentic Workflow Design from Structural Priors

Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses

From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling

PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference

Cheaper, Better, Faster, Stronger: Robust Text-to-SQL without Chain-of-Thought or Fine-Tuning

Cornserve: A Distributed Serving System for Any-to-Any Multimodal Models

Faithfulness-QA: A Counterfactual Entity Substitution Dataset for Training Context-Faithful RAG Models

WhisperPipe: A Resource-Efficient Streaming Architecture for Real-Time ASR

Learning Illumination Control in Diffusion Models

BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate

MobileLLM-Flash: Latency-Guided On-Device LLM Design for Industry Scale Deployment

VOYAGER: A Training Free Approach for Generating Diverse Datasets using LLMs

From Local to Global: Revisiting Structured Pruning Paradigms for LLMs

QFlash: Bridging Quantization and Memory Efficiency in Vision Transformer Attention

ARQ: A Mixed-Precision Quantization Framework for Accurate and Certifiably Robust DNNs

Doing More With Less: Revisiting the Effectiveness of LLM Pruning for Test-Time Scaling

Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models

Feasible-First Exploration for Constrained ML Deployment Optimization

FED-FSTQ: Fisher-Guided Token Quantization for Communication-Efficient Federated Fine-Tuning of LLMs on Edge Devices

Backtranslation Augmented Direct Preference Optimization for Neural Machine Translation