Knowledge makes me travel through time and space.

Home
Work
Projects
Collections
Specials
Translations
Links
Books
Archives
Categories
Tags
About
Search

Links

Internal Site

36大数据
About云开发：专注大数据云技术
MATLAB中文论坛
SegmentFault
UDN企业互联网技术社区
UML软件工程组织-火龙果软件工程
阿里中间件团队博客
并发编程网
美团点评技术团队
搜索技术博客－淘宝
淘宝数据库研发组
淘宝数据库研发组: 数据库内核月报
腾讯大数据：大数据学院
腾讯Dev开发者社区
携程技术中心：技术分享
云栖社区：优质博文集锦
有赞技术团队
中国云计算：云计算资料和交流中心

Foreign Site

Blog

Alexander J. Smola
Andrey Kurenkov’s Web World
Colah’s blog
Geoffrey E. Hinton
Hellojavacases微信公众号网站
Java Performance Tuning Guide
July：结构之法算法之道
Jürgen Schmidhuber
Knight：专注于互联网广告，社区平台，资源下载平台，计算机图像图形学技术
Lxw：大数据田地Hadoop/Hive/HBase/Spark/Java
MSDN Blogs
Netkiller 系列电子书
Sebastian Thrun
Shai Shalev-Shwartz
Yoshua Bengio
董的博客：关注大规模数据处理，Hadoop，YARN，MapReduce，Spark，Mesos
花钱的年华：
寒小阳：专注机器学习/数据挖掘
简单之美：大数据
开涛的博客
李鼎(哲良)
李社河：坚持努力做吧，少年！
阮城锋
如果天空不死
星空：做一个有准备的人
小石头的码疯窝-ML DL CV
星星：算法、搜索、分布式
杨尚川：大数据、搜索引擎
张龙（风中叶）：探寻未知

Architecture

Atlas: High Level Architecture
阿里毕玄：我在系统设计上犯过的14个错
架构师画像
架构师最怕程序员知道的10件事

Project

Accumulo
Ambari
Apex: Enterprise-grade unified stream and batch processing engine
Atlas: Data Governance and Metadata framework for Hadoop
Avro: a data serialization system.
Alluxio: Open Source Memory Speed Virtual Distributed Storage
Beam
Canal: 阿里巴巴mysql数据库binlog的增量订阅&消费组件
Cassandra: Manage massive amounts of data, fast, without losing sleep
Druid: A high-performance, column-oriented, distributed data store.
Falcon: Feed management and data processing platform
Flink: Scalable Batch and Stream Data Processing
Flume
Ganglia Monitoring System
Generatedata: Random data generator in JS, PHP and MySQL
Hadoop: An open-source software for reliable, scalable, distributed computing
HAWQ: Apache Hadoop Native SQL
HBase: A distributed, scalable, big data store.
HBase Blog
HBase ™ Reference Guide
Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying.
Hive Wiki
Hue: Hadoop User Experience
Impala: Real-time Query for Hadoop
iPython
Kafka: A high-throughput distributed messaging system.
Keras: Deep Learning library for Theano and TensorFlow
Kerberos: The Network Authentication Protocol
KeystoneML
Kaldi Speech Recognition Toolkit
Kaldi: Github
Knox: REST API Gateway for the Apache Hadoop Ecosystem
Kudu: Fast Analytics on Fast Data
Kylin: an open source Distributed Analytics Engine
Lasagne: Lightweight library to build and train neural networks in Theano http://lasagne.readthedocs.org/
MADlib: Big Data Machine Learning in SQL
Mahout: Scalable machine learning and data mining
Matplotlib: Python Plotting
Metron: REAL-TIME BIG DATA SECURITY
Nginx is an HTTP and reverse proxy server
Numpy
Oozie: Apache Oozie Workflow Scheduler for Hadoop
OpenCV: Open Source Computer Vision Library
OpenCV: Github
Otter: 阿里巴巴分布式数据库同步系统(解决中美异地机房)
Pandas
Parquet: a columnar storage format
Pig
Pivotal Extension Framework (PXF)
Presto: Distributed SQL query engine for big data
Quiver: Interactive convnet features visualization for Keras
Ranger: Enable, monitor and manage comprehensive data security across the Hadoop platform
Scikit-learn
Scipy
Shellinabox: Web based AJAX terminal emulator
Slider: Dynamic YARN Applications
Spark: Lightning-fast cluster computing
Sqoop
Storm: A free and open source distributed realtime computation system
streamDM: Data Mining for Spark Streaming
Sympy
SystemML: Declarative Large-Scale Machine Learning
Succinct: Enabling Queries on Compressed Data
TensorFlow: an Open Source Software Library for Machine Intelligence
TensorLayer: Deep Learning and Reinforcement Learning Library for TensorFlow
Tesseract Open Source OCR Engine
Tez
TFLearn: Deep learning library featuring a higher-level API for TensorFlow
Vert.x is a tool-kit for building reactive applications on the JVM.
Zeppelin: A web-based notebook that enables interactive data analytics.
ZooKeeper: A high-performance coordination service for distributed applicatins.

Resources

Anaconda Cloud: Search
Datahub
Deep Learning Resources: NVIDIA Developer
GitBook
InfoQ迷你书
Kaggle Datasets
Python Extension Packages for Windows: Christoph Gohlke
Read the Docs
Seminar Schedule of Protein Structure Group
Spark Packages
Stanford Engineering Everywhere: Course
Stanford University Explore Courses
UCI Machine Learning Repository: Data Sets

Docs/Wiki

ANACONDA Documentation
Cloudera Product Documentation
Conda documentation
Deep Learning Tutorials
Deep Learning: An MIT Press book
Hortonworks Documentation
MapR Documentation
Pivotal Documentation
Redhat Product Documentation
Spring Documentation
Transwarp Download
Stanford Machine Learning
UFLDL教程

Tools

Try Pandoc(a universal document converter)!
MSDN, 我告诉你

Research/Reports

CMMI Resources
Technical Reports: EECS at UC Berkeley
大数据(Big Data Research: BDR)

Conferences

Computational Linguistics / NLP Conferences Calendar
Conferences Archive: O’Reilly Media
NIPS Conference
Spark Summit: The premier event series of Apache Spark
USENIX ATC Conferences
USENIX NSDI Conferences

Github

Deeplearning4j: Open-source, distributed deep learning for the JVM on Spark with GPUs
Gliese581gg: Jinyoung Choi
Iluwatar: Ilkka Seppälä
Tobegit3hub: Storage(HBase, Ceph etc), IaaS(Linux, OpenStack etc) and Machine Learning with Kubernetes and TensorFlow.
Ty4z2008: Jun Liao

Other

OneAPM: 端到端的应用性能管理软件云解决方案，应用性能监控平台
The Hadoop Ecosystem Table
Sort Benchmark

HyperJ

Knowledge makes me travel through time and space.

Gist GitHub Blog Linkedin

Links

寒小阳
dailidong
Time渐行渐远
程序员疯子
编程小梦
Hexiaoqiao
占小狼

© 2013 – 2018 HyperJ