

Projects & Articles¶

Projects¶

Spark SQL: Spark SQL is Apache Spark's module for working with structured data.
Hive: The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.
Presto: Distributed SQL Query Engine for Big Data.
Impala - The open source, native analytic database for Apache Hadoop.
Druid - A high performance real-time analytics database.
Kylin - An open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop/Spark supporting extremely large datasets.
HAWQ - Apache Hadoop Native SQL. Advanced, MPP, elastic query engine and analytic database for enterprises.
Drill - Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage
TiDB - 开源分布式关系型数据库
ClickHouse - An open source column-oriented database management system capable of real time generation of analytical data reports using SQL queries.
SnappyData - The Apache Spark Database.
Doris(Palo) - A MPP-based interactive SQL data warehousing for reporting and analysis.
Antlr4 - ANother Tool for Language Recognition.
Calcite - Dynamic data management framework.