MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

发布

2026年04月08日

采集 2026年04月08日 04:31

学术前沿 6.3 分 — 单GPU训练100B+参数模型的方案

评分 6.3 · 来源：cs.CL updates on arXiv.org · 发布于 2026-04-08

评分依据：单GPU训练100B+参数模型的方案

arXiv:2604.05091v1 Announce Type: new Abstract: We present MegaTrain, a memory-centric system that efficiently trains 100B+ parameter large language models at full precision on a single GPU. Unlike traditional GPU-centric systems, MegaTrain stores parameters and optimizer states in host memory (CPU memory) and treats GPUs as transient compute engines. For each layer, we stream parameters in and compute gradients out, minimizing persistent device state. To battle the CPU-GPU bandwidth bottleneck, we adopt two key optimizations. 1) We introduce a pipelined double-buffered execution engine that o