Skip to content

数据质量(Data Quality)

Measure Model

Accuracy(准确性)

数据准确,无异常和错误

Do data objects accurately represent the “real-world” values they are expected to model?
Incorrect attribute of selling items across several systems can impact operational and analytical applications.

方案:对比数据是否一致,通过维度约减、数据范围及数据分组提升性能

Profiling(统计信息)

数据基础的统计信息

Apply statistical analysis and assessment of data values within a dataset for consistency, uniqueness and logic

方案:对数据基础信息进行统计分析,值域、长度、计数、基数、频率、空值率、直方图等

Completeness(完整性)

数据概念完整,满足业务需要

Is all necessary data present

方案:检查数据的完整性,基于统计信息或是文件元数据

相关:异常值、元数据、链路

Timeliness(时效性)

数据能否按时产出

Is the data available at the time needed

方案:基于文件、数据、任务时间等

相关:SLA、优先级

Anomaly detection(异常检测)

数据是否合理,满足期望

Pre-built algorithm functions for the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset

方案:基于统计信息、异常检测、时间序列分析等

相关:空值、基数、波动

Validity(有效性)

数据能满足业务约束

Are all data values within the data domains specified by the business

方案:基于统计信息、自定义规则或函数

相关:范围、波动、极值、均值、方差、特殊字符与乱码

Reference