Microsoft Ships In-House Transcription, Voice Cloning, and Image Generation Models

Microsoft released three first-party AI models on Foundry: transcription at 3.9% word error rate across 25 languages, voice generation at 60 seconds per second, and image creation ranked #3 globally.
artificial-intelligence
Author

Kabui, Charles

Published

2026-04-05

Keywords

microsoft-mai, speech-to-text, voice-synthesis, image-generation, microsoft-foundry