
Run ML models in the cloud
Visit WebsiteTL;DR - Replicate
- Cloud API to run and fine-tune thousands of open-source AI models without managing GPUs
- Pay-per-second pricing from $0.0001/sec (CPU) to $0.012/sec (8x H100) with auto-scaling to zero
- Best for developers building AI features who want model variety without infrastructure overhead
Pros & Cons
Pros
- No infrastructure management required, run GPU models with a single API call
- Scale-to-zero billing means no cost during idle periods
- Thousands of pre-built community models ready for immediate use
- Fine-tuning support lets teams customize models on proprietary data
- Open-source Cog tool makes packaging custom models straightforward
- Broad hardware selection from CPUs to 8x H100 GPU clusters
Cons
- Per-second pricing can get expensive at high sustained usage volumes
- Cold start latency when models scale up from zero
- Limited control over underlying infrastructure and hardware selection
- Private model deployments charge for idle time unlike public models
- No SLA or guaranteed uptime outside enterprise agreements
Key Features
Pricing Plans
Pay-as-you-go (Public Models)
Usage-based
- CPU: $0.0001/sec
- Nvidia T4 GPU: $0.000225/sec
- Nvidia L40S GPU: $0.000975/sec
- Up to 8x H100 GPU: $0.0112/sec
- Image models: $0.025–$0.09 per output
- LLMs: $3.00–$3.75 per million input tokens
- Video models: $0.09–$0.25 per second of output
- Scale to zero — no charge when idle
- Thousands of community models included
Dedicated Hardware (Private Models)
From $0.09/hr
- CPU Small: $0.09/hr ($0.000025/sec)
- Up to 8x H100 GPU: $43.92/hr ($0.0122/sec)
- Dedicated instances for custom models
- Pay for all time instances are online including idle
- Fast-booting fine-tunes exempt from idle charges
Enterprise
Custom
- Volume discounts
- Dedicated support
- Custom SLAs
- Contact sales for pricing
What is Replicate?
Reviews
Be the first to review Replicate
Your take helps the next buyer. Verified LinkedIn reviewers get a badge.
Write a reviewBest Replicate Alternatives
Top alternatives based on features, pricing, and user needs.
AI community and platform
Open-source AI model platform
The end-to-end AI cloud that simplifies building and deploying models with GPU infrastructure.
Manage, store, and distribute software applications, AI/ML models, and components at scale.
Ultra-low latency batched inference for Generative AI at datacenter scale.
The scalable MLOps platform enabling CI/CD for ML and pipeline automation on-prem and any-cloud.
Build, train, and deploy AI/ML models on accelerated cloud GPUs with simplicity and scalability.
The fastest AI inference and reasoning on GPUs with unified control for production AI.
Explore More
Replicate FAQ
What programming languages does Replicate support?
Do I need ML expertise to use Replicate?
How does Replicate pricing work?
Can I deploy my own custom models on Replicate?
What is the cold start time for models?
Does Replicate offer a free tier?
Source: replicate.com