Skip to content
Deepinfra logo

Deepinfra

Unclaimed

Accelerate your AI with developer-friendly APIs for performance and cost-efficient machine learning inference.

Visit Website

TL;DR - Deepinfra

  • Provides APIs for fast, cost-efficient AI model inference.
  • Offers a wide range of machine learning models including text, image, and video generation.
  • Ensures data privacy and security with SOC 2 and ISO 27001 certifications.
Pricing: Free plan available
Best for: Growing teams

Pros & Cons

Pros

  • Fast, simple, and reliable AI inference
  • Cost-efficient with pay-as-you-go pricing
  • Strong focus on data privacy and security (SOC 2, ISO 27001)
  • Wide variety of machine learning models available
  • Scalable infrastructure for various business needs

Cons

  • Pricing can vary significantly per model and usage type (tokens, execution time)
  • No explicit free tier mentioned for model inference, only paid options

Preview

Key Features

Developer-friendly APIs for AI inferenceAccess to 100+ machine learning models (text generation, image generation, video generation, speech synthesis, etc.)Pay-as-you-go pricing modelOn-demand GPU rental (e.g., DGX B200 GPUs)Zero retention policy for inputs, outputs, and user dataSOC 2 and ISO 27001 certifiedCustomizable inference solutions (cost, latency, throughput, scale optimization)Proprietary inference-optimized infrastructure in US-based data centers

Pricing Plans

moonshotai/Kimi-K2.5 (text-generation)

$0.45/M in • $2.80/M out

  • 256k context window
  • $0.09 cached / 1M tokens

zai-org/GLM-4.7-Flash (text-generation)

$0.06/M in • $0.40/M out

  • bfloat16
  • 198k context window
  • $0.01 cached / 1M tokens

nvidia/Nemotron-3-Nano-30B-A3B (text-generation)

$0.05/M in • $0.20/M out

  • fp4
  • 256k context window

NVIDIA gpu-rental On-Demand DGX B200 GPUs

$2.49/instance-hour

deepseek-ai/DeepSeek-V3.2 (text-generation)

$0.26/M in • $0.38/M out

  • fp4
  • 160k context window
  • $0.13 cached / 1M tokens

Bria/fibo_edit (text-to-image)

$0.00/image

  • Free for a limited time

Bria/video_eraser (text-to-video)

$0.14/second

Bria/video_foreground_mask (text-to-video)

$0.14/second

Bria/video_increase_resolution (text-to-video)

$0.14/second

Bria/video_mask_by_key_points (text-to-video)

$0.14/second

Bria/video_mask_by_prompt (text-to-video)

$0.14/second

Bria/video_remove_background (text-to-video)

$0.14/second

PrunaAI/p-image (text-to-image)

$0.005/image

PrunaAI/p-image-Edit (text-to-image)

$0.01/image

bosonai/HiggsAudioV2.5 (text-to-speech)

$20.00 per 1M characters

ResembleAI/chatterbox-turbo (text-to-speech)

$1.00 per 1M characters

What is Deepinfra?

Editorial review
DeepInfra provides a platform for fast, simple, and reliable AI inference, offering developer-friendly APIs to accelerate AI models. It allows users to access and deploy a wide range of machine learning models, including text generation, text-to-image, text-to-video, text-to-speech, automatic speech recognition, embeddings, and rerankers. The platform is designed for performance and cost-efficiency, catering to both startups and enterprises with scalable infrastructure. DeepInfra focuses on providing tailored inference solutions, optimizing for factors like cost, latency, throughput, and scale. It boasts a zero retention policy for user data, ensuring privacy and compliance with SOC 2 and ISO 27001 certifications. The service runs on its own cutting-edge, inference-optimized infrastructure located in secure US-based data centers, promising better performance and reliability for its users. Additionally, it offers GPU rental for on-demand access to powerful hardware like DGX B200 GPUs.

Reviews

Be the first to review Deepinfra

Your take helps the next buyer. Verified LinkedIn reviewers get a badge.

Write a review

Best Deepinfra Alternatives

Top alternatives based on features, pricing, and user needs.

View full list →

Explore More

Deepinfra FAQ

What is the primary architectural design of the NVIDIA Nemotron 3 Nano model on DeepInfra?

The NVIDIA Nemotron 3 Nano model is built with a hybrid Mixture-of-Experts (MoE) and Mamba architecture. This design is optimized for fast, cost-efficient inference and delivers strong multi-step reasoning capabilities.

How does DeepInfra ensure the privacy and security of user data?

DeepInfra maintains a zero retention policy, meaning user inputs, outputs, and data remain private. The platform is SOC 2 and ISO 27001 certified, adhering to best practices in information security and privacy.

Can I customize the voice for text-to-speech generation using Qwen3-TTS-VoiceDesign?

Yes, Qwen3-TTS-VoiceDesign allows users to describe the desired voice using natural language, rather than selecting from preset options. This enables the model to generate speech in a custom voice based on the text description.

What is the pricing model for language models like DeepSeek-V3.2 on DeepInfra?

DeepSeek-V3.2 and other language models on DeepInfra are priced per 1 million input and output tokens. For DeepSeek-V3.2, the cost is $0.26 per 1M input tokens (or $0.13 cached) and $0.38 per 1M output tokens.

What types of hardware and data centers does DeepInfra utilize for its inference infrastructure?

DeepInfra operates on its own cutting-edge, inference-optimized infrastructure. This infrastructure is housed in secure, US-based data centers, providing enhanced performance and reliability.

Guides & Articles