Skip to content

Projects & Articles

Projects

  • Spark SQL: Spark SQL is Apache Spark's module for working with structured data.
  • Hive: The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.
  • Presto: Distributed SQL Query Engine for Big Data.
  • Impala - The open source, native analytic database for Apache Hadoop.
  • Druid - A high performance real-time analytics database.
  • Kylin - An open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop/Spark supporting extremely large datasets.
  • HAWQ - Apache Hadoop Native SQL. Advanced, MPP, elastic query engine and analytic database for enterprises.
  • Drill - Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage
  • TiDB - 开源分布式关系型数据库
  • ClickHouse - An open source column-oriented database management system capable of real time generation of analytical data reports using SQL queries.
  • SnappyData - The Apache Spark Database.
  • Doris(Palo) - A MPP-based interactive SQL data warehousing for reporting and analysis.
  • Antlr4 - ANother Tool for Language Recognition.
  • Calcite - Dynamic data management framework.

Papers

Spark

SIGMOD

Articles

Resources

Spark

Reference