Creating a Data Quality Firewall and Data Quality SLA
How do you prevent poor levels of supplier data quality impacting your organisation?
The answer? Through the adoption of a Data Quality Firewall and ongoing Data Quality SLA process.
This post creates a blueprint for creating your own Data Quality Firewall and ensuring your data quality SLA provides certified, high quality data throughout your organisation.
In 2005 I was consulting at a small utilities company running a data quality assessment project prior to a data migration.
I got everyone in a room and started a simple information chain workshop task, sticking a load of Post-It’s on the whiteboard to map out the people, process and data chains that mattered to the team.
We talked at length about the issues people were seeing in the data.
People were generally happy but one person was quiet. So I asked what they did.
It transpired that this persons role was to take the same piece of data from a major external supplier of data into the business - every single month.
Each month, they sat at a computer, laboriously fixing data defects manually.
What’s more, it was the same type of defects that they had to fix, again and again.
It was reassuring they had some kind of ‘Data Quality Firewall’ to prevent inbound data quality issues, but there were obvious flaws in their approach:
The company refused to push back on the supplier and demand a data quality Service Level Agreement (SLA). Yes, this can be a challenge but in this case they hadn’t exhausted the options for rejecting the data at all.
This person had huge amounts of undocumented knowledge in his head and was demoralised. He was a high churn risk and obviously a major risk to the company if he quit.
Doing stuff over and over again in the name of data quality is just bad practice (and extremely expensive).
The lead times for clean up meant major hassles downstream as the data consumers waited.
So what were they doing right?
Someone was assigned to this “firewall” (yes, it was a screwed up process but at least there was ownership)
They were diligent (nothing bad got through, according to downstream users)
They were trapping the data at source (nothing sneaked past this guy until it was ready)
The fact is that many organisations get the data supplier relationship wrong by:
Assuming the data will be of ‘good enough’ quality, because they have a contract of supply
Applying manual hacks and tweaks downstream (incurring costs the supplier should bear)
Applying a dedicated person or team at the point of entry (instead of building automation and data quality monitoring to free up the labour and push the defects back to the supplier)
It’s incredible how many companies I meet that accept data from outside the corporate walls and assume it is correct, or simply believe ‘defects are reality, we can live with them’.
Don’t make this assumption.
Take a proactive stance using the following steps as an outline guide for getting started.
Simple Roadmap for a Data Quality Firewall and Data Quality SLA
When creating a Data Quality Firewall and SLA you may find the following tactics valuable.
Map the flow of information from supplier to consumer
Map all the information sources flowing into your organisation. If you don’t have budget for an expensive modelling tool, I find that a pack of Post-It’s and communal bag of donuts is sufficient to run your first modelling workshop.
Identify what data specifications exist (and create where they’re missing)
When information comes from an external source, identify whether the following exists:
Formal data specifications (e.g. field formats, frequency of delivery, expected values, permitted ranges)
Escalation procedures or standard operating procedures for when defects are found
Does no specification, documentation or process information exist?
Then create new documentation listing the elements above but also add:
Simple information chain diagram explaining where this information comes in and feeds to
Names and contact details for everyone involved with this data, both on the supplier and consumer side
Present your findings and recommendations for Data Quality
Present this to the relevant stakeholder explaining your desire to improve this inbound information source with a view to:
Preventing embarrassment to the stakeholder when bad data impacts other business units (I would probably open with this)
Reducing the cost savings of building a process that eliminates endless amounts of scrap and rework activity (costs based on past incidents is ideal here)
Increasing overall perception of the stakeholders team as a highly professional, innovative resource (they will like this)
Reducing the lead times and improving various other metrics the stakeholder will be held accountable for
Get them to sign off that they are ultimately responsible for this process and you will provide them regular reports on how the process is working and the value that it (and they) are bringing to the business
Recommend a Pilot Data Quality Management Initiative:
Document all known issues with this inbound information source, surveys and casual conversations with data workers, DBA’s, app designers and anyone else who touches this data downstream
Armed with this anecdotal data, create a robust data quality assessment process:
Profile the data using one of the many free (or commercial) tools now available
Rigorously document the data quality rules you feel the data should be managed against e.g.
Timeliness – the data must arrive between 9am and 10am every weekday morning
Completeness – the customer name and address fields must be populated, there must be a valid order number etc.
Formatting – the order number must be in the format of NN-LLLL-NN, with no exceptions
Overloading – each record must only have one entry in the part code, there must not be multiple part codes in the same field
etc….
Convert these data quality rules into a live monitoring process, e.g.
Use one of the many data quality and data integration tools available
Use standard scripts in SQL or Unix, or whatever your data processing platform uses
Implement the process and start to track the issues discovered
Review the impact of any data defects and request a more robus ‘Data Quality SLA’ with your data supplier:
Run this process for at least a month, discovering issues and fixing manually where applicable
Create a comprehensive file of all the issues found and the impacts they’re having in your organisation
Approach the data supplier outlining your findings, the innovations you have made and the issues you are picking up
If no SLA or formal definitions and agreements around data quality exist, work with your stakeholder and the supplier to get something in place
Agree to share your technology and approach with the supplier so that they can improve their data quality (chances are it’s being supplied to other companies too)
Work together to iteratively create the most robust data quality firewall and SLA process possible
Sample Sections of a Data Quality SLA
Please note, this information is provided merely as an introductory guide without prejudice, it is not intended as legal advice, always seek professional assistance when creating legal documentation.
There are hundreds of templates for an SLA on the web but here is one example of how you could apply it to data quality:
1. Introduction
This Service Level Agreement (SLA) documents the agreed provision of service for the supply of data by [Supplier] to [Receiver]. This document provides a legally binding contract that stipulates the service and quality levels that will be enforced during the term of the service agreement between both parties.
2. Parties to the Agreement
Lists the people who have reviewed and approved the SLA.
3. Scope
High-level outline of the data and services to be provided.
4. Term
Indicate the start and end date of the SLA.
5. Conventions
List all the standards, terms and definitions that are referenced throughout the SLA. For example:
“Office Hours” refers to 9am to 5pm GMT.
“Working Day” refers to Monday to Friday except for UK designated holidays.
“CSV” refers to “Comma Separated File”, a raw text file format as described in this open standard: http://mastpoint.curzonnassau.com/csv-1203/csv-1203.pdf
etc…
6. Service Levels
Data Delivery Schedule: Indicates how frequently data should be supplied e.g Every working day between 9am and 10am GMT.
Data Delivery Specification: How will the data be structured? Which fields must be mandatory? What standard data types will exist? What CSV standard must be adhered to? What standards for identifiers and codes must be adhered to? How will the data be encrypted e.g zip file? What is the data standard and specification for attribute A, B, C, …,n?
Data Delivery Process: High-level diagram outlining the steps required for data delivery.
Escalation Process: When issues are found, how will the supplier be notified, what is expected of them?
7. Service Level Targets and Penalties
Describe each service level, it’s measure and it’s target. For example:
Measure: Timeliness
Description: Data must be supplied within the approved data delivery schedule
Target: 95% annual delivery within the approved data delivery schedule, 97% within approved data delivery schedule + 30 minutes, 100% within approved data delivery schedule + 60 minutes
Penalties: £10 for every minute delay up to 60 minutes. Past 60 minutes £1000 per hour delay.
8. Rewards
Outline any reward structures that the supplier receives for meeting their SLA targets.
9. Points of Contact
List of key personnel involved in the service from both parties.
10. Signatures
List of authorised signatures to make the agreement binding.
10. Appendix, Glossary and References
Provide additional resources such as sample file extracts, detailed glossary and links to other standards and external information that is relevant to the SLA.