Skip to content
星际流动

ForestPrune: High-ratio Visual Token Compression for Video Multimodal Large Language Models via Spatial-Temporal Forest Modeling

发布
采集
学术前沿 6.0 分 — 中等偏上:有一定信息增量和参考价值
原文: cs.AI updates on arXiv.org

评分 6.0 · 来源:cs.AI updates on arXiv.org · 发布于 2026-04-14

评分依据:中等偏上:有一定信息增量和参考价值

ForestPrune: High-ratio Visual Token Compression for Video Multimodal Large Language Models via Spatial-Temporal Forest Modeling

arXiv:2603.22911v2 Announce Type: replace-cross Abstract: Due to the great saving of computation and memory overhead, token compression has become a research hot-spot for MLLMs and achieved remarkable progress in image-language tasks. However, for the video, existing methods still fall short of high-ratio token compression. We attribute this shortcoming to the insufficient modeling of temporal and continual video content, and propose a novel and training-free token pruning method for video…