Deepgram is the best-value speech AI platform for developers, and it is not close.
The $200 free credit with no expiration gives you ~43,000 minutes of Nova transcription to build and test with — more generous than any competitor. Nova-3 at $0.0077/minute ($0.46/hour) delivers accuracy rivaling OpenAI Whisper at roughly half the cost, with real-time streaming, speaker diarization, and 30+ languages.
The per-second billing is a genuine differentiator: competitors like AssemblyAI and AWS Transcribe round to the nearest second or 15-second increment, which inflates costs 5-15% on short audio clips. The Growth plan ($4K/year minimum) saves up to 20% and increases WebSocket concurrency from 150 to 225 — worth it once you are spending $350+/month consistently.
The Voice Agent API ($0.050-0.163/min depending on BYO components) positions Deepgram as a full voice AI stack, not just a transcription service. Main caveat: the older Enhanced and Base models are more expensive than Nova while being less accurate — always use Nova-3 or Flux unless you have a specific legacy reason.
Free
Usage-based with $200 free credit (~43,000 minutes Nova transcription)
$4,000/year (minimum)
Pre-paid credits with up to 20% savings and higher concurrency
Custom pricing for large volumes, specific deployment, or support needs
Add-on features stack up
speaker diarization ($0.002/min) and PII redaction ($0.002/min) each add ~26% to the base Nova-3 price. A fully-featured transcription pipeline (Nova-3 + diarization + redaction + keyterm prompting) costs $0.0130/min vs $0.0077 base — a 69% premium.
Multilingual Nova-3 costs 19% more than monolingual ($0.0092/min vs $0.0077/min). If your audio is consistently in one language, always use the monolingual model to save.
The $200 free credit is not renewable — once spent, you are on metered billing. Auto-reload defaults to $100 when balance drops to $10, which can surprise developers who forget to set billing alerts.
Whisper Cloud (self-hosted Whisper model via Deepgram) is limited to 5 concurrent connections on all plans. If you need Whisper specifically at scale, you may need to self-host.
Voice Agent API pricing tiers are confusing
Standard ($0.075/min) includes Deepgram STT + TTS + LLM. BYO TTS drops to $0.065/min, BYO LLM to $0.056/min, BYO both to $0.050/min. The Advanced tier ($0.163/min) is 2x the cost with unclear feature differentiation.
Growth plan requires a $4,000/year minimum commitment. Credits are pre-paid and redeemed against usage — if your usage drops, unused credits do not roll over (verify terms).
Audio Intelligence (summarization, sentiment, topics) uses a separate per-token pricing model ($0.0003-0.0006/1K tokens) that is difficult to estimate in advance since token count depends on audio content length and complexity.
Developers building voice-enabled applications (meeting bots, call analytics, voice assistants) who need a reliable real-time STT API with sub-300ms latency
Call centers and sales teams processing thousands of hours of recordings monthly — Nova-3 at $0.46/hour is 50-70% cheaper than Google Speech-to-Text or AWS Transcribe for equivalent accuracy
Startups building voice agent products who want STT + TTS + voice agent orchestration from a single vendor with unified billing
Companies processing short audio clips (voicemails, IVR, voice messages) where per-second billing saves 5-15% vs competitors that round up
startup
Start on Pay As You Go and build your product against the Deepgram API. The $200 credit covers months of development. Move to Growth ($4K/year) once you are consistently spending $350+/month — the 15-20% savings and higher concurrency limits are meaningful at that scale. The Voice Agent API is worth exploring if you are building a conversational AI product.
enterprise
Enterprise plan for 50,000+ hours/month with custom concurrency, SLA guarantees, and potential on-premises deployment. Negotiate volume discounts — at scale, Deepgram typically offers 30-50% below list rates. The Voice Agent API positions Deepgram as a single vendor for your entire voice stack (STT + TTS + agent orchestration), simplifying procurement.
freelancer
The $200 free credit is the best trial offer in speech AI — use it to build and test without any payment. For ongoing light usage (under 100 hours/month), Pay As You Go at $0.0077/min ($46/month for 100 hours) is extremely affordable. No other provider matches this price-to-accuracy ratio at low volumes.
small Business
Growth plan is the sweet spot for production workloads processing 1,000-10,000 hours/month. At $0.0065/min ($3.90/hour), Deepgram is 3-5x cheaper than Google or AWS for equivalent accuracy. Budget $500-5,000/month depending on volume. Add diarization and redaction only where needed — they add 26%+ to costs.
Deepgram is the price-performance leader in speech-to-text.
At $0.0077/minute for Nova-3, it is 2-3x cheaper than Google Cloud Speech-to-Text ($0.012-0.02/min), 3-5x cheaper than AWS Transcribe ($0.024/min), and 10-15x cheaper than AssemblyAI ($0.12-0.15/min for their best model). The accuracy gap has narrowed significantly — Nova-3 matches or exceeds Whisper large-v3 on most benchmarks while running in real-time. AssemblyAI differentiates with higher accuracy on difficult audio (heavy accents, background noise) and built-in summarization, making it the choice when accuracy matters more than cost. OpenAI Whisper API ($0.006/min) is slightly cheaper per-minute but offers no streaming, no diarization, and no speaker labeling — it is batch-only. Google and AWS charge more but offer deep ecosystem integration (Vertex AI, Amazon Connect). For most developers building voice features, Deepgram offers the best combination of price, speed, accuracy, and developer experience. The addition of TTS (Aura) and Voice Agent API makes it increasingly a one-stop shop for voice AI.