When Is It Impossible To Practice Root-Cause Data Quality Defect Prevention?

If you read the articles, webinars and interviews on Data Quality Pro regularly then you’ll know that root-cause prevention of data defects is by far the best outcome and something our expert contributors preach relentlessly.

But what about those situations when this simply isn’t feasible?

Data Quality Root-Cause Prevention - Prevented by Contracts

This post was sparked by a great comment made by one of our members, Daniel Bucosky, an Enterprise Information Quality expert and senior manager in financial services, during the recent lively debate titled: What initial activities do you undertake as a new data quality manager?

Daniel made a great point that in many cases the way our contracts are set up with suppliers of services or systems means that root-cause prevention is often impractical or actually in breach of a contract:

  1. A majority of systems in the financial institution where I workare vendor ownedand supported. The associated vendor agreements DO NOT specify any clause(s) addressing data quality norpotential systems changes required to address data quality issues. This poses quite a challenge for remediation of data driven concerns.
  2. Service level agreements (SLA) from external data suppliers also present challenges for remediation if not carefully crafted to support quality data (e.g. external data provider DQ practices, support for data correction, timely response to DQ correction requests, etc.). - Daniel Bucosky

This got me thinking – what are those situations where root-cause prevention simply isn’t achievable so down-stream cleansing and tactical, localised improvements, are absolutely the only way forward?

Here are my examples but what about yours –can you think of some more? Please add your thoughts below or in the forum discussion I created here:

Root-cause Data Quality Defect Prevention – When Is This Simply Not Achievable?

Example #1: Data is supplied by an upstream company outside of your "data quality firewall”

I once consulted on a small engagement where a utilities company was receiving monthly feeds from a large mainstream utilities company that supplied them with household location, engineering and usage data.

The challenge they faced was that the data had known defects but the supplier refused to resolve them. The data was provided "as-is” with no warranty so the downstream data team in the smaller company simply had no option but to manually correct the data every month.

Example #2: Time restricted data migration projects

If you’re involved with a large data migration then there are typically several options for maintaining the quality of data flowing into the new target system.

  • Option #1: Cleanse in the target system (not advisable but still commonly practiced)
  • Option #2: Cleanse within the migration logic during the migration execution
  • Option #3: Cleanse within a suitable staging area prior to the migration execution
  • Option #4: Cleanse during the extract from the legacy systems
  • Option #5: Cleanse directly on the legacy system
  • Option #6: Perform root-cause prevention on the legacy system so that newly created data is correct and retrospectively cleanse any older data

Option 6 obviously makes the most sense because if the project takes 12 months then at least the current users in the legacy world will get better quality data and then any costs incurred by the migration team (e.g. staging area management, coding, design, testing, specialist products etc.) will be minimized.

The problem is that in many cases we simply can’t go in and start making changes to the legacy environment that will fit the new target system.

In one situation, a UK client had several mainframe systems that would not permit the type of root-cause preventions required for the tight timeframe the project demanded.

Example #3: The system is effectively closed with no change access

Data Quality Pro has gone through several iterations in terms of technology infrastructure on the back-end. Our previous content management system had a quirk that did not allow members to enter spaces in their second name. So for example Daragh O Brien, or Michael de Winter could not be entered correctly. Daragh for example was forced to write "Daragh OBrien”. 

We got a number of complaints about this and as a site that prides itself on data quality best-practice it was obviously quite ironic and very embarrassing!

The challenge was that my hands were tied. 

I didn’t have access to the underlying code, exactly the same kind of issue Daniel Bucosky is facing above. I did eventually outline the problem with the developers and they made the change but it took many months because no-one else was complaining and they were under no obligation legally to fix it. As a result, some of our member name data was formatted incorrectly and this really does irritate people.

Some people pointed out that I should have built the system myself or had a team build it for me but if you take a look at the features in our latest community platform you will see that the team behind this have spent 10 years refining this product into a state of the art community system, there is no way I could afford that level of investment to build on my own, plus it would take far too long.

This is why companies chose Custom Off The Shelf Tools (COTS) in the first place but the downside is often the fact you can't correct data quality root-cause issues in a time-frame that suits you.

Example #4: The system is beyond its support contract with no access to expertise

Consider an ageing banking mainframe system that is no longer supported and the original development team are no longer available. In many senses this also becomes a closed system that no-one is willing to tackle. Banks may instead decide to take feeds out of the system, cleanse the data and then run any financial calculations or billing processes outside of the main banking logic of the mainframe. 

I've been on several data migration and data quality projects where there is simply a lack of specialist skills to make those upstream corrections so we have to be pragmatic about what we can motivate the customer to achieve.

These are just 4 examples but where else have you seen root-cause prevention be unachievable? 

Please share your experiences so we can build up a comprehensive list and perhaps get a debate going on this issue.

Thanks again to Daniel Bucosky for sparking this original debate, a common challenge many companies face I'm sure.