Fish Audio S2: Open-Source Text-to-Speech Beats Google and OpenAI in Blind Listening Tests

Fish Audio’s open-source S2 model wins 81.88% of blind comparisons against OpenAI’s gpt-4o-mini-tts, runs on a single GPU at near real-time, and reads inline stage directions like [whisper] to steer emotion.
artificial-intelligence
Author

Kabui, Charles

Published

2026-06-01

Keywords

text-to-speech, fish-audio-s2, voice-cloning, open-source-tts, inline-emotion-tags