PipeLive: Efficient Live In-place Pipeline Parallelism Reconfiguration for Dynamic LLM Serving

发布

2026年04月15日

采集 2026年04月15日 04:35

学术前沿 3.2 分 — Moderate AI relevance +practical(3)

评分 3.2 · 来源：cs.LG updates on arXiv.org · 发布于 2026-04-15

评分依据：Moderate AI relevance +practical(3)

arXiv:2604.12171v1 Announce Type: cross Abstract: Pipeline parallelism (PP) is widely used to partition layers of large language models (LLMs) across GPUs, enabling scalable inference for large models. However, existing systems rely on static PP configurations that fail to adapt to dynamic settings, such as serverless platforms and heterogeneous GPU environments. Reconfiguring PP by stopping and redeploying service incurs prohibitive downtime, so reconfiguration must instead proceed live and in…