Skip to content
Ollama v0.19 logo

Ollama v0.19

Unclaimed

Run large language models locally on your machine with enhanced performance.

Visit Website

TL;DR - Ollama v0.19

  • Runs large language models locally on your hardware.
  • Offers significant performance boosts on Apple Silicon with MLX integration.
  • Provides both free local usage and paid cloud plans for advanced needs.
Pricing: Free plan available
Best for: Growing teams

Pros & Cons

Pros

  • Enables private and secure local execution of LLMs without data logging.
  • Offers significant performance improvements on Apple Silicon devices.
  • Provides a flexible pricing model with a robust free tier for local usage.
  • Supports advanced quantization for better model accuracy and efficiency.
  • Features intelligent caching for faster and more responsive agentic workflows.

Cons

  • High-performance features like MLX acceleration are specific to Apple Silicon.
  • Cloud model usage is metered and has limits based on the subscription plan.
  • Requires specific hardware (e.g., >32GB unified memory for some models) for optimal local performance.

Key Features

Local execution of large language modelsCLI and API for model interactionSupport for Apple Silicon with MLX framework for accelerated performanceNVFP4 quantization support for higher accuracy and reduced memoryImproved caching for efficient coding and agentic tasksAccess to 40,000+ community integrationsUnlimited public models for local useCloud model access with varying concurrency and usage limits

Pricing Plans

Free

$0

  • Download
  • Automate coding, document analysis, and other tasks with open models
  • Keep your data private
  • Run models on your hardware
  • Access cloud models
  • CLI, API, and desktop apps
  • 40,000+ community integrations
  • Unlimited public models
  • Run 1 cloud model at a time

Pro

$20/mo

  • Everything in Free, plus:
  • Run 3 cloud models at a time
  • 50x more cloud usage than Free
  • Upload and share private models

Max

$100/mo

  • Everything in Pro, plus:
  • Run 10 cloud models at a time
  • 5x more usage than Pro

What is Ollama v0.19?

Editorial review
Ollama is a platform that allows users to download, run, and manage large language models (LLMs) directly on their local hardware. It provides a command-line interface (CLI) and API for interacting with these models, enabling tasks like coding automation, document analysis, and personal assistants. Ollama emphasizes privacy by keeping data local and offers access to a vast library of open models. The platform is designed for developers, researchers, and anyone looking to leverage the power of LLMs without relying solely on cloud services. Recent updates, particularly for Apple Silicon users, have significantly boosted performance by integrating with Apple's MLX framework, leading to faster response times and more efficient resource utilization. Ollama also supports advanced quantization formats like NVFP4 for higher model accuracy and production parity. Beyond local execution, Ollama offers optional cloud plans for more demanding workloads, providing access to a curated list of cloud-enabled models with varying usage limits and concurrency options. These cloud plans are designed to scale with user needs, from light usage for experimentation to heavy, sustained tasks for continuous agent workflows, all while maintaining a strong commitment to data privacy.

Reviews

Be the first to review Ollama v0.19

Your take helps the next buyer. Verified LinkedIn reviewers get a badge.

Write a review

Best Ollama v0.19 Alternatives

Top alternatives based on features, pricing, and user needs.

Explore More

Ollama v0.19 FAQ

How does Ollama leverage Apple's MLX framework to improve performance on Apple Silicon?

Ollama integrates with Apple's MLX framework to take advantage of its unified memory architecture and the new GPU Neural Accelerators on M5, M5 Pro, and M5 Max chips. This significantly accelerates both time to first token (TTFT) and generation speed (tokens per second) for LLMs running on Apple Silicon devices.

What is NVFP4 support, and how does it benefit Ollama users?

NVFP4 support means Ollama can leverage NVIDIA's NVFP4 format, which maintains model accuracy while reducing memory bandwidth and storage requirements for inference. This allows users to achieve results consistent with production environments and opens up the ability to run models optimized by NVIDIA's model optimizer.

How do Ollama's improved caching mechanisms enhance efficiency for coding and agentic tasks?

Ollama's upgraded cache reuses cache across conversations to lower memory utilization and increase cache hits, especially with shared system prompts. It also stores intelligent checkpoints in the prompt, reducing processing time and leading to faster responses, while shared prefixes survive longer even when older branches are dropped.

What are the key differences in cloud usage and concurrency between the Free, Pro, and Max plans?

The Free plan allows 1 concurrent cloud model and light usage. The Pro plan offers 3 concurrent cloud models and 50x more usage than Free, suitable for day-to-day work. The Max plan provides 10 concurrent cloud models and 5x more usage than Pro, designed for heavy, sustained tasks and continuous agent workflows. Local model usage is unlimited across all plans.

Can I use Ollama with custom fine-tuned models, and what are the plans for easier import?

While the current preview release focuses on specific models, Ollama is actively working to support future models and will introduce an easier way to import custom models fine-tuned on supported architectures. In the meantime, they plan to expand the list of supported architectures.

Source: ollama.com

Guides & Articles