Fish
UnclaimedThe most expressive open-source voice AI model for realistic and conversational speech generation.
1
Products
About Fish
Fish Audio S2 is an advanced, open-source text-to-speech (TTS) model designed for unparalleled expressiveness, speed, and flexibility. It allows users to generate highly realistic and natural-sounding speech with fine-grained control over emotions, paralanguage, and multi-speaker conversations. Built from the ground up for real-time applications, S2 boasts ultra-low latency, making it suitable for conversational AI, live dubbing, and interactive voice experiences.
The model supports localized control over speech generation through natural language instructions embedded directly within the text, enabling users to add elements like laughter, whispers, sighs, and specific tones. With full open-source access to both inference code and model weights, developers can run S2 on their own infrastructure, fine-tune it with custom data, and integrate it without vendor lock-in. It supports over 80 languages and is built with an SGLang-based streaming inference engine for optimized performance.
