Visualizes data dependencies and flow across pipelines.
Open-source and compatible with OpenLineage integrations.
Pricing: Free forever
Best for: Individuals & startups
Pros & Cons
Pros
Provides clear visual data lineage for complex systems
Open-source and extensible with a modular design
Supports real-time metadata collection
Enables automation of tasks like backfills and root cause analysis
Reference implementation for OpenLineage, ensuring broad compatibility
Cons
Requires integration with existing data processing tools
May have a learning curve for new users unfamiliar with data lineage concepts
Community-driven support, which might not be as immediate as commercial solutions
Key Features
OpenLineage-compatible metadata serverUnified visual graph for data interdependenciesFlexible Lineage API for querying metadataMetadata Repository for historical job and dataset dataMetadata UI for dataset discovery and dependency explorationImmutable data model with versioned jobs and datasetsIntegration with Apache Airflow, Apache Spark, Apache Flink, dbt, and Dagster
Pricing
Free
Marquez is completely free to use with no hidden costs.
Marquez is an open-source metadata server that provides real-time collection of information from running jobs and applications, acting as the reference implementation for OpenLineage. It offers a unified visual graph through a web user interface, allowing users to explore complex interdependencies within their data ecosystem, trace lineage, and analyze performance metrics.
The platform is designed as a modular, highly scalable, and extensible solution for metadata management. It consists of a Metadata Repository for storing job and dataset metadata, a RESTful Metadata API for interaction, and a Metadata UI for discovery and dependency graph exploration. Marquez emphasizes an immutable data model, tracking versioned jobs and datasets to ensure reproducibility and provide powerful visualizations of data flow. It's ideal for data engineers, data scientists, and anyone managing complex data pipelines who needs to understand data provenance, automate tasks like backfills, and perform root cause analysis.
Marquez is an open-source metadata server that collects real-time information from data jobs and applications. It provides data lineage, showing how data is produced, consumed, and transformed across various pipelines, and offers a visual interface to explore these dependencies.
How much does Marquez cost?
Marquez is an open-source project, meaning it is free to use.
Is Marquez free?
Yes, Marquez is an open-source project and is completely free to use.
Who is Marquez for?
Marquez is for data engineers, data scientists, and organizations that need to manage, understand, and visualize data lineage across complex data ecosystems. It helps in tasks like data governance, impact analysis, root cause analysis, and enriching data catalogs.