VEGA-3D: Video Generation Models Already Understand 3D Space

A plug-and-play framework extracts 3D spatial knowledge from frozen video generation models, boosting 3D localization by 4.5 points with zero 3D training data.
artificial-intelligence
Author

Kabui, Charles

Published

2026-04-02

Keywords

vega-3d, video-diffusion, 3d-scene-understanding, spatial-reasoning, multimodal-llm