Ratings aggregated from independent review platforms. Learn more
Key Features
Push-based micro-batch and pull-based streaming data ingestionReal-time upsert, append, and pre-aggregation storage engineColumnar storage engine, MPP architecture, cost-based query optimizer, vectorized execution engine for high-performance queriesFederated querying of data lakes (Hive, Iceberg, Hudi) and databases (MySQL, PostgreSQL)Support for compound data types (Array, Map, JSON) and Variant data type for auto JSON inferenceNGram bloomfilter and inverted index for text searchesDistributed design for linear scalability, workload isolation, and tiered storageCompatible with MySQL protocol and ANSI SQL for BI tool integration
Pricing
Free
Apache Doris is completely free to use with no hidden costs.
Apache Doris is an open-source, real-time analytical database designed for high-performance data analytics and search. It supports both micro-batch and streaming data ingestion, allowing for real-time updates, appends, and pre-aggregation of data. The database is optimized for high-concurrency and high-throughput queries, leveraging a columnar storage engine, Massively Parallel Processing (MPP) architecture, a cost-based query optimizer, and a vectorized execution engine.
This database is ideal for organizations needing to perform real-time analytics on large datasets, especially those integrating with data lakes (like Hive, Iceberg, Hudi) and traditional databases (MySQL, PostgreSQL). It caters to developers and data engineers who require a scalable, distributed system capable of handling complex data types, text searches, and seamless integration with BI tools and external compute engines like Spark and Flink. Its distributed design ensures linear scalability and efficient resource management through workload isolation and tiered storage, supporting both shared-nothing and storage-compute separation architectures.
How does Apache Doris handle real-time data ingestion and updates?
Apache Doris supports both push-based micro-batch and pull-based streaming data ingestion, achieving data updates within a second. Its storage engine is designed for real-time upsert, append, and pre-aggregation operations, ensuring data freshness and efficiency.
What mechanisms does Apache Doris employ to optimize high-concurrency and high-throughput queries?
Apache Doris optimizes queries through a columnar storage engine, MPP architecture, and a cost-based query optimizer. It also utilizes a vectorized execution engine to enhance performance for high-concurrency and high-throughput scenarios.
Can Apache Doris query data stored in external data lakes and databases?
Yes, Apache Doris supports federated querying, allowing it to access data from various data lakes such as Hive, Iceberg, and Hudi. It can also query data directly from external databases like MySQL and PostgreSQL.
What advanced data types and indexing capabilities does Apache Doris offer for complex data and text searches?
Apache Doris provides support for compound data types including Array, Map, and JSON, with a variant data type for automatic JSON schema inference. For text searches, it offers NGram bloom filters and inverted indexes.
How does Apache Doris ensure scalability and efficient resource management in its distributed architecture?
Apache Doris is built with a distributed design for linear scalability, supporting both shared-nothing clusters and separation of storage and compute. It also incorporates workload isolation and tiered storage to manage resources efficiently.