ForestPrune: High-ratio Visual Token Compression for Video Multimodal Large Language Models via Spatial-Temporal Forest Modeling

发布

2026年04月14日

采集 2026年04月14日 04:31

学术前沿 6.0 分 — 中等偏上：有一定信息增量和参考价值

评分 6.0 · 来源：cs.AI updates on arXiv.org · 发布于 2026-04-14

评分依据：中等偏上：有一定信息增量和参考价值

ForestPrune: High-ratio Visual Token Compression for Video Multimodal Large Language Models via Spatial-Temporal Forest Modeling

arXiv:2603.22911v2 Announce Type: replace-cross Abstract: Due to the great saving of computation and memory overhead, token compression has become a research hot-spot for MLLMs and achieved remarkable progress in image-language tasks. However, for the video, existing methods still fall short of high-ratio token compression. We attribute this shortcoming to the insufficient modeling of temporal and continual video content, and propose a novel and training-free token pruning method for video…