23 Tips for Creating Effective Data Quality Scorecards
Data Quality Scorecards are one of the most vital resources for guiding your team and motivating your project sponsors towards the goal of data quality management.
But how do you go about creating a data quality scorecard and what are some of the key points to consider?
Here are 23 pointers to set you in the right direction.
Data Quality Scorecards: 23 Tips
Make your data quality scorecard the centrepiece of your data quality initiative
Scorecards are key to understanding how well the data supports various reports, analytical and operational processes, and data-driven projects
Critical for making good decisions about data quality improvement initiatives
Without a data quality scorecard, all you have are raw materials and no value-added product to justify further investment into data quality management
Consider the creation of 4 levels to your data quality scorecard:
Score Summary
Score Decompositions
Intermediate Error Reports
Atomic Level Data Quality Information
Well-designed aggregate scores are goal driven and allow us to evaluate data fitness for various purposes and indicate quality of various data collection processes
From the perspective of understanding the data quality and its impact on the business, aggregate scores are the key piece of data quality metadata
The data quality scorecard is a collection of aggregate scores
Aggregate scores help make sense out of the numerous error reports produced in the course of data quality assessment and without aggregate scores, error reports often discourage rather than enable data quality improvement.
Be careful when selecting aggregate scores to measure, scores not tied to a meaningful business objective are useless
A simple aggregate score for the entire database is usually rather meaningless
Good aggregate scores are goal driven and allow us to make better decisions and take actions. Poorly designed aggregate scores are just meaningless numbers
It is possible and desirable to build many different aggregate scores by selecting different groups of target data records. The most valuable scores measure data fitness for various business uses
Scorecards allow us to estimate the cost of bad data to the business, to evaluate potential ROI of data quality initiatives, and to set correct expectations for data-driven projects
If you define the objective of a data quality assessment project as calculating the cost of bad data to the business for example, you will have much easier time finding sponsors for your initiative
It is usually important to know if the data errors are mostly historic or were introduced recently
Score decompositions show contributions of different components to the data quality
Score decompositions can be built along many dimensions, including data elements, data quality rules, subject populations, and record subsets
Level of detail obtained through score decompositions is enough to understand where most data quality problems come from
For more detailed decomposition produce various reports of individual errors that contribute to the score (or sub-score) tabulation
Detailed reports can be filtered and sorted in various ways to better understand the causes, nature, and magnitude of the data problems
At the very bottom of the data quality scorecard pyramid are reports showing the quality of individual records or subjects, these are called atomic level reports and identify records and subjects affected by errors, they could even estimate the probability that each data element is erroneous
Building and maintaining a dimensional time-dependent data quality scorecard must be one of the first priorities in any data quality management initiative