TODO

Improve

Book

  • 大数据日知录架构与算法
  • Spark内核设计的艺术:架构设计与实现
  • 深入分布式缓存:从原理到实践
  • 从Paxos到Zookeeper:分布式一致性原理与实践
  • 程序员的数学:1,2,3
  • 算法(第四版)
  • Head First:设计模式
  • MySQL技术内幕:InnoDB存储引擎
  • 深入理解Java虚拟机:JVM高级特性与最佳实践

Blog

Site

Project

InfluxData

InfluxData provides a Modern Time Series Platform, designed from the ground up to handle metrics and events. InfluxData’s products are based on an open source core. This open source core consists of the projects—Telegraf, InfluxDB, Chronograf, and Kapacitor; collectively called the TICK Stack.

Oryx 2

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine learning. It is a framework for building applications, but also includes packaged, end-to-end applications for collaborative filtering, classification, regression and clustering.

NLTK

Natural Language Toolkit

DeepWalk

Deep Learning for Graphs

ClickHouse

ClickHouse is an open source column-oriented database management system capable of real time generation of analytical data reports using SQL queries.

Quick Start
Blazing Fast
Linearly Scalable
Hardware Efficient
Fault Tolerant
Feature Rich
Highly Reliable
Simple and Handy

ONOS

ONOS is the only SDN controller platform that supports the transition from legacy “brown field” networks to SDN “green field” networks. This enables exciting new capabilities, and disruptive deployment and operational cost points for network operators.

SnappyData

SnappyData, the Spark Database.

Stream - Transact - Analyze - Predict all in one cluster

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

JavaCC

Dr. Elephant

Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark

Article

Tutorial

Docs

Tool