Skip to content
Fish logo

The most expressive open-source voice AI model for realistic and conversational speech generation.

1

Products

Visit Website

About Fish

Fish Audio S2 is an advanced, open-source text-to-speech (TTS) model designed for unparalleled expressiveness, speed, and flexibility. It allows users to generate highly realistic and natural-sounding speech with fine-grained control over emotions, paralanguage, and multi-speaker conversations. Built from the ground up for real-time applications, S2 boasts ultra-low latency, making it suitable for conversational AI, live dubbing, and interactive voice experiences. The model supports localized control over speech generation through natural language instructions embedded directly within the text, enabling users to add elements like laughter, whispers, sighs, and specific tones. With full open-source access to both inference code and model weights, developers can run S2 on their own infrastructure, fine-tune it with custom data, and integrate it without vendor lock-in. It supports over 80 languages and is built with an SGLang-based streaming inference engine for optimized performance.

Products by Fish