daVinci-MagiHuman: One Model Generates Synchronized Video and Audio in 2 Seconds

A 15B-parameter single-stream Transformer generates synchronized talking-head video and audio in 2 seconds, beating multi-stream rivals with an 80% win rate in human evaluation.
artificial-intelligence
Author

Kabui, Charles

Published

2026-03-29

Keywords

davinci-magihuman, audio-video-generation, single-stream-transformer, open-source-ai, multimodal-generation