工程实践
249 篇文章
- 7.0
Why Search When You Can Transfer? Amortized Agentic Workflow Design from Structural Priors
SWIFT 通过结构先验的少样本迁移摊销 agentic 工作流设计,替代逐任务组合搜索
- 7.0
Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses
AHE 框架通过 observability 驱动自动进化 coding agent harness,解决异构动作空间和稀疏信号难题
- 5.5
From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling
Agora-Opt 模块化 agent 框架结合去中心化辩论与读写记忆银行用于优化建模
- 6.0
PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference
多个并发推理 agent 共享单一非对称压缩 KV cache 池,Keys int8 量化 Values int4 量化
- 5.0
Cheaper, Better, Faster, Stronger: Robust Text-to-SQL without Chain-of-Thought or Fine-Tuning
无需 CoT 或微调的鲁棒 text-to-SQL 方法,平均成本远低于现有 SOTA 方法
- 5.5
Cornserve: A Distributed Serving System for Any-to-Any Multimodal Models
Any-to-Any 多模态模型分布式 serving 系统,处理不同请求走不同计算图路径的挑战
- 5.0
Faithfulness-QA: A Counterfactual Entity Substitution Dataset for Training Context-Faithful RAG Models
99,094 样本的反事实实体替换数据集,训练偏好上下文而非参数记忆的 RAG 模型
- 4.5
WhisperPipe: A Resource-Efficient Streaming Architecture for Real-Time ASR
WhisperPipe 实现有界内存的实时流式 ASR,在转录准确率和计算效率间取得更好权衡
- 5.0
Learning Illumination Control in Diffusion Models
完全开源可复现的扩散模型光照控制 pipeline,无需深度图等重度控制输入
- 6.0
BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate
通过非对称辩论合成训练数据来训练自定义策略护栏分类器,减少昂贵标注需求
- 5.0
MobileLLM-Flash: Latency-Guided On-Device LLM Design for Industry Scale Deployment
硬件在环架构搜索设计面向移动延迟约束的端侧 LLM 方法论
- 5.0
VOYAGER: A Training Free Approach for Generating Diverse Datasets using LLMs
Voyager 无训练方法用 LLM 生成多样化数据集,迭代优化数学多样性指标
- 5.0
From Local to Global: Revisiting Structured Pruning Paradigms for LLMs
重新审视 LLM 结构化剪枝:从局部重建到全局任务目标
- 5.0
QFlash: Bridging Quantization and Memory Efficiency in Vision Transformer Attention
QFlash 解决 ViT FlashAttention 全整数化的三个主要障碍
- 4.5
ARQ: A Mixed-Precision Quantization Framework for Accurate and Certifiably Robust DNNs
ARQ 混合精度量化方法同时优化执行效率和可认证鲁棒性
- 5.5
Doing More With Less: Revisiting the Effectiveness of LLM Pruning for Test-Time Scaling
重新审视 LLM 剪枝对 test-time scaling 的有效性,发现结构化剪枝显著损害推理能力
- 5.0
Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models
绿色压缩流水线:从碳足迹角度审视 LLM 压缩策略
- 4.5
Feasible-First Exploration for Constrained ML Deployment Optimization
可行优先探索用于崩溃易发的层级混合变量搜索空间中的约束 ML 部署优化
- 4.5
FED-FSTQ: Fisher-Guided Token Quantization for Communication-Efficient Federated Fine-Tuning of LLMs on Edge Devices
Fisher 信息引导的 token 量化用于边缘设备通信高效联邦 LLM 微调
- 4.5
Backtranslation Augmented Direct Preference Optimization for Neural Machine Translation
回译增强 DPO 用于 NMT 后训练,无需平行语料