
Inferless
UnclaimedDeploy and scale machine learning models on serverless GPUs in minutes.
Visit WebsiteTL;DR - Inferless
- Deploys machine learning models to serverless GPUs rapidly.
- Automatically scales GPU resources from zero to hundreds based on demand.
- Offers usage-based billing and fast cold starts for cost-effective inference.
Pros & Cons
Pros
- Eliminates infrastructure management for GPU clusters
- Scales automatically with workload, paying only for usage
- Achieves sub-second cold starts for large models
- Provides significant cost savings compared to traditional GPU clusters
- Offers enterprise-grade security with SOC-2 Type II certification
Cons
- Specific pricing details for enterprise plans require direct contact
- Currently in private beta for certain offerings, requiring waitlist access
Ratings Across the Web
Ratings aggregated from independent review platforms. Learn more
Key Features
Pricing Plans
Free TrialStarter
$0.000555/sec
- Designed for small teams and independent developers
- Deploy models in minutes without worrying about the cost
Enterprise
Contact us
- Built for fast-growing startups and larger organizations
- Scale quickly at an affordable cost with desired latency results
Nvidia T4 Dedicated
$0.000185/sec
- GPU RAM: 16GB
- vCPUs: 3x
- RAM: 20GB
Nvidia A10 Dedicated
$0.000341/sec
- GPU RAM: 24GB
- vCPUs: 7x
- RAM: 30GB
Nvidia A100 Dedicated
$0.001491/sec
- GPU RAM: 80GB
- vCPUs: 20x
- RAM: 200GB
Nvidia T4 Shared
$0.000092/sec
- GPU RAM: 8GB
- vCPUs: 1.5x
- RAM: 10GB
Nvidia A10 Shared
$0.000170/sec
- GPU RAM: 12GB
- vCPUs: 3x
- RAM: 15GB
Nvidia A100 Shared
$0.000745/sec
- GPU RAM: 40GB
- vCPUs: 10x
- RAM: 100GB
Volume Pricing - Storage
Free 50GB/month, then $0.3/GB/month
- 50 GB free every month
- Extra storage costs $0.3/GB/month
Join Waitlist (Startup)
Contact us
- Min 10,000 Inference Requests per month
- Unlimited deployed webhook endpoints
- GPU concurrency of 5
- 15 day of log retention
- Support via private Slack connect within 48 working hours
- Include Credits : $30
Get Early Access (Enterprise)
Contact us
- Min 100,000 Inference Requests per month
- Unlimited deployed webhook endpoints
- GPU concurrency of 50
- 365 day of log retention
- Support via private Slack connect & support engineer
- Include Credits : Custom
What is Inferless?
Reviews
Be the first to review Inferless
Your take helps the next buyer. Verified LinkedIn reviewers get a badge.
Write a reviewBest Inferless Alternatives
Top alternatives based on features, pricing, and user needs.
High-performance AI infrastructure for developers to deploy, train, and scale ML workloads.
Platform for web developers
Serverless GPUs for AI
GPU serverless for ML
ML model deployment platform
Accelerate AI model inference with optimized compilation and serverless deployment.
Explore More
Inferless FAQ
How does Inferless manage GPU sharing and elasticity, given that Kubernetes typically doesn't allow for direct GPU sharing?
What is the practical difference in performance and cost between a 'Shared' and 'Dedicated' GPU instance on Inferless?
Can I deploy a model that requires specific pre-processing and post-processing functions alongside the model file itself?
How does Inferless achieve a 99% reduction in model cold start times, particularly for large models like GPT-J?
What security measures are in place to ensure data and model isolation for customers using Inferless?
If my model has varying inference request patterns, how does Inferless's billing model ensure I only pay for what I use, especially if there are periods of no activity?
Source: inferless.com