Skip to content
Nessie logo

Nessie

Unclaimed

Transactional catalog for data lakes with Git-like semantics for consistent data views.

Visit Website
Tracked since2026
0 reviews tracked

The Bottom Line

Entry price

Free, no paid tier

Biggest pro

Provides strong data consistency and isolation

Biggest con

Requires understanding of Git concepts for full utilization

TL;DR - Nessie

  • Applies Git-like version control to data lakes.
  • Ensures always-consistent, isolated, and atomic data changes.
  • Manages data files and metadata without copying actual data.
Pricing: Free forever
Best for: Individuals & startups

What is Nessie?

Editorial review
Nessie is a transactional catalog for data lakes that brings Git-like version control semantics to data. It allows users to manage data in data lakes with concepts like commits, branches, and tags, similar to how source code is managed. This enables an always-consistent view of data across all involved datasets and tables, ensuring that changes from batch jobs or experiments are isolated and applied atomically. Nessie eliminates the need for manual tracking of individual data files by referencing existing immutable data files and automatically managing their lifecycle, including garbage collection. This tool is designed for data engineers and organizations working with large data lakes, especially those using tools like Apache Hive or Apache Spark. It simplifies data management by providing a robust versioning system that prevents incomplete changes from being visible to users and allows for safe experimentation and development without impacting production data. Nessie does not copy data but rather tracks metadata, making it efficient for managing vast amounts of data and numerous tables.

Available on: Web

Pros & Cons

Pros

  • Provides strong data consistency and isolation
  • Simplifies data management in large data lakes
  • Enables safe experimentation and development environments
  • Eliminates manual tracking of data files
  • Efficient as it does not copy actual data

Cons

  • Requires understanding of Git concepts for full utilization
  • Primarily focused on data lake management, not a general-purpose database

Key Features

Git-like semantics for data (commits, branches, tags)Always-consistent view of data across all datasetsIsolated changes for batch jobs and experimentsAtomic application of data changesAutomatic garbage collection of unused data filesSupport for multi-table transactionsReferences existing immutable data filesIntegration with Apache Hive and Apache Spark

Pricing Plans

Open Source

Free

  • Transactional Catalog for Data Lakes
  • Git-inspired data version control
  • Cross-table transactions and visibility
  • Open data lake approach, supporting Hive, Spark, Dremio, Trino, etc.
  • Works with Apache Iceberg tables
  • Run as a Docker image or on Kubernetes

Reviews

Be the first to review Nessie

Your take helps the next buyer. Verified LinkedIn reviewers get a badge.

Write a review

Best Nessie Alternatives

Top alternatives based on features, pricing, and user needs.

Most buyers shortlist 2 or 3 tools before committing. Pull a side-by-side comparison or browse the full alternatives shortlist below.

Explore More

Nessie FAQ

How does Nessie ensure data consistency across multiple tables during a commit?

Nessie treats a commit as a multi-table transaction, meaning that a single commit can group data file changes from various tables. This ensures that all changes within that commit are applied atomically, providing an always-consistent view of the data across all affected tables simultaneously.

Can Nessie be used to manage schema changes in a data lake, such as adding or renaming columns?

Yes, Nessie tackles the challenge of managing schema changes, including adding or removing columns, changing column types, and renaming columns in tables and views. It tracks the metadata associated with data files, allowing it to manage these structural changes effectively while maintaining data consistency.

What happens to data files that are no longer referenced by any commit or branch in Nessie?

Nessie includes an automatic garbage collection mechanism. It knows which data files are actively being used and which are no longer referenced by any commit or branch. These unreferenced data files can then be safely and automatically removed, optimizing storage and preventing data clutter.

How does Nessie integrate with existing data processing frameworks like Apache Spark or Apache Hive?

Nessie is designed for easy integration with existing data processing tools. For frameworks like Apache Spark or Apache Hive, integrating Nessie typically involves a simple configuration change rather than requiring modifications to existing production code. This allows current jobs to leverage Nessie's versioning capabilities without significant refactoring.

Since Nessie doesn't copy data, how does it handle updates or deletions of existing data within a data file?

Data files in a data lake are immutable. When an update or deletion is required for data within an existing file, Nessie's underlying mechanism involves reading the original data file, applying the necessary changes, and then writing a new data file containing the updated or modified data. The original data file then becomes irrelevant and is eventually subject to garbage collection, while Nessie references the new, updated file.