
Deepinfra
UnclaimedAccelerate your AI with developer-friendly APIs for performance and cost-efficient machine learning inference.
Visit WebsiteFreemiumVisit Website
TL;DR - Deepinfra
- Provides APIs for fast, cost-efficient AI model inference.
- Offers a wide range of machine learning models including text, image, and video generation.
- Ensures data privacy and security with SOC 2 and ISO 27001 certifications.
Pricing: Free plan available
Best for: Growing teams
Pros & Cons
Pros
- Fast, simple, and reliable AI inference
- Cost-efficient with pay-as-you-go pricing
- Strong focus on data privacy and security (SOC 2, ISO 27001)
- Wide variety of machine learning models available
- Scalable infrastructure for various business needs
Cons
- Pricing can vary significantly per model and usage type (tokens, execution time)
- No explicit free tier mentioned for model inference, only paid options
Preview
Key Features
Developer-friendly APIs for AI inferenceAccess to 100+ machine learning models (text generation, image generation, video generation, speech synthesis, etc.)Pay-as-you-go pricing modelOn-demand GPU rental (e.g., DGX B200 GPUs)Zero retention policy for inputs, outputs, and user dataSOC 2 and ISO 27001 certifiedCustomizable inference solutions (cost, latency, throughput, scale optimization)Proprietary inference-optimized infrastructure in US-based data centers
Pricing Plans
moonshotai/Kimi-K2.5 (text-generation)
$0.45/M in • $2.80/M out
- 256k context window
- $0.09 cached / 1M tokens
zai-org/GLM-4.7-Flash (text-generation)
$0.06/M in • $0.40/M out
- bfloat16
- 198k context window
- $0.01 cached / 1M tokens
nvidia/Nemotron-3-Nano-30B-A3B (text-generation)
$0.05/M in • $0.20/M out
- fp4
- 256k context window
NVIDIA gpu-rental On-Demand DGX B200 GPUs
$2.49/instance-hour
deepseek-ai/DeepSeek-V3.2 (text-generation)
$0.26/M in • $0.38/M out
- fp4
- 160k context window
- $0.13 cached / 1M tokens
Bria/fibo_edit (text-to-image)
$0.00/image
- Free for a limited time
Bria/video_eraser (text-to-video)
$0.14/second
Bria/video_foreground_mask (text-to-video)
$0.14/second
Bria/video_increase_resolution (text-to-video)
$0.14/second
Bria/video_mask_by_key_points (text-to-video)
$0.14/second
Bria/video_mask_by_prompt (text-to-video)
$0.14/second
Bria/video_remove_background (text-to-video)
$0.14/second
PrunaAI/p-image (text-to-image)
$0.005/image
PrunaAI/p-image-Edit (text-to-image)
$0.01/image
bosonai/HiggsAudioV2.5 (text-to-speech)
$20.00 per 1M characters
ResembleAI/chatterbox-turbo (text-to-speech)
$1.00 per 1M characters
What is Deepinfra?
DeepInfra provides a platform for fast, simple, and reliable AI inference, offering developer-friendly APIs to accelerate AI models. It allows users to access and deploy a wide range of machine learning models, including text generation, text-to-image, text-to-video, text-to-speech, automatic speech recognition, embeddings, and rerankers. The platform is designed for performance and cost-efficiency, catering to both startups and enterprises with scalable infrastructure.
DeepInfra focuses on providing tailored inference solutions, optimizing for factors like cost, latency, throughput, and scale. It boasts a zero retention policy for user data, ensuring privacy and compliance with SOC 2 and ISO 27001 certifications. The service runs on its own cutting-edge, inference-optimized infrastructure located in secure US-based data centers, promising better performance and reliability for its users. Additionally, it offers GPU rental for on-demand access to powerful hardware like DGX B200 GPUs.
Reviews
Be the first to review Deepinfra
Your take helps the next buyer. Verified LinkedIn reviewers get a badge.
Write a reviewBest Deepinfra Alternatives
Top alternatives based on features, pricing, and user needs.
Explore More
Deepinfra FAQ
What is the primary architectural design of the NVIDIA Nemotron 3 Nano model on DeepInfra?
The NVIDIA Nemotron 3 Nano model is built with a hybrid Mixture-of-Experts (MoE) and Mamba architecture. This design is optimized for fast, cost-efficient inference and delivers strong multi-step reasoning capabilities.
How does DeepInfra ensure the privacy and security of user data?
DeepInfra maintains a zero retention policy, meaning user inputs, outputs, and data remain private. The platform is SOC 2 and ISO 27001 certified, adhering to best practices in information security and privacy.
Can I customize the voice for text-to-speech generation using Qwen3-TTS-VoiceDesign?
Yes, Qwen3-TTS-VoiceDesign allows users to describe the desired voice using natural language, rather than selecting from preset options. This enables the model to generate speech in a custom voice based on the text description.
What is the pricing model for language models like DeepSeek-V3.2 on DeepInfra?
DeepSeek-V3.2 and other language models on DeepInfra are priced per 1 million input and output tokens. For DeepSeek-V3.2, the cost is $0.26 per 1M input tokens (or $0.13 cached) and $0.38 per 1M output tokens.
What types of hardware and data centers does DeepInfra utilize for its inference infrastructure?
DeepInfra operates on its own cutting-edge, inference-optimized infrastructure. This infrastructure is housed in secure, US-based data centers, providing enhanced performance and reliability.
Source: deepinfra.com