What is Fireworks AI?
Fireworks AI is a cloud inference platform for running open-source generative AI models. It provides serverless API endpoints for over 400 models including LLMs, image generators, vision models, and audio models, with no cold starts or GPU management required.
How much does Fireworks AI cost?
Serverless inference starts at $0.10 per 1M tokens for models under 4B parameters. Larger models cost more: $0.20 for 4B-16B, $0.90 for 16B+. On-demand GPU deployments range from $2.90/hour for A100s to $9.00/hour for B200s. New users receive $1 in free credits.
What models does Fireworks AI support?
Fireworks hosts 400+ models including DeepSeek V3, Qwen, Meta Llama, GLM-4, Kimi K2.5, Gemma 3, FLUX.1 for image generation, and Whisper V3 for audio. New open-source models are typically added within days of release.
Can I fine-tune models on Fireworks AI?
Yes. Fireworks supports supervised fine-tuning (SFT) and direct preference optimization (DPO). Pricing starts at $0.50 per 1M tokens for SFT on models up to 16B parameters, scaling up for larger models. Quantization-aware tuning is also available.
Is Fireworks AI compatible with the OpenAI API?
Yes. Fireworks provides an OpenAI-compatible API, allowing developers to switch from OpenAI by changing the base URL and API key. This makes it straightforward to migrate existing applications or use Fireworks as a fallback provider.
What compliance certifications does Fireworks AI have?
Fireworks AI is SOC 2, HIPAA, and GDPR compliant. It offers zero data retention options for sensitive workloads, and enterprise customers can deploy on their own cloud infrastructure via bring-your-own-cloud arrangements.