基于HBase的海量GIS数据分布式处理实践

Distributed processing practice of the massive GIS data based on HBase

李雪梅1,邢俊峰1,刘大伟1,王海洋1,2,刘玮1,2

LI Xuemei1, XING Junfeng1, LIU Dawei1, WANG Haiyang1,2, LIU Wei1,2

1.烟台中科网络技术研究所,山东 烟台 264003;

1.Institute of Network Technology, ICT(YANTAI), Yantai 264003, China

2.中国科学院计算技术研究所,北京 100080

2.Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China

摘要 设计了一种基于分布式数据库HBase的GIS数据管理系统。系统优化了栅格数据的生成和存储过程,将海量 栅格数据直接写入HBase存储、索引。同时,针对矢量空间数据的存储、索引与检索,提出了一种新的rowkey 设计,既考虑经纬度,又考虑空间数据类型和属性,使得在按空间位置检索矢量地理信息时,能通过HBase的 rowkey迅速定位需要返回的数据。在HBase的集群环境上用真实GIS数据对上述方法进行了验证,结果表明, 提出的系统具有较高的海量数据存储和检索性能,实现了海量地理信息数据的高效存储和实时高速检索。

Abstract:Based on the distributed database HBase, a kind of GIS data management system was designed. The system optimized the generated and stored procedures of raster data, which could be directly written into the storage and indexing of the HBase. At the same time, in view of the storing, indexing and retrieval of the vector spatial data, a new design for rowkey was proposed that considering both the latitude and longitude, and the spatial data types and attributes. So that the data needed to be returned could be quickly located by rowkey of the HBase, when retrieving vector geographic information according to the spatial location. The above methods had been verified on the HBase cluster environment with real GIS data. The results show that the proposed system has high performance for storage and retrieval of mass data, and realizes the efficient storage and real-time high-speed retrieval of the vast geographic information data.

关键词: 大数据, HBase, 栅格数据, 矢量数据, rowkey

Key words: big data; HBase; raster data; vector data; rowkey

资源

基于HBase的海量GIS数据分布式处理实践.pdf