Skip to content
星际流动

PipeLive: Efficient Live In-place Pipeline Parallelism Reconfiguration for Dynamic LLM Serving

发布
采集
学术前沿 3.2 分 — Moderate AI relevance +practical(3)
原文: cs.LG updates on arXiv.org

评分 3.2 · 来源:cs.LG updates on arXiv.org · 发布于 2026-04-15

评分依据:Moderate AI relevance +practical(3)

arXiv:2604.12171v1 Announce Type: cross Abstract: Pipeline parallelism (PP) is widely used to partition layers of large language models (LLMs) across GPUs, enabling scalable inference for large models. However, existing systems rely on static PP configurations that fail to adapt to dynamic settings, such as serverless platforms and heterogeneous GPU environments. Reconfiguring PP by stopping and redeploying service incurs prohibitive downtime, so reconfiguration must instead proceed live and in…