How to Transform Your ETL Tool Into a Data Quality Toolkit

Because of the way ETL tools process information flows they allow for a variety of Data Quality functions to be supported

Because of the way ETL tools process information flows they allow for a variety of Data Quality functions to be supported

How to Transform Your ETL Tool Into a Data Quality Toolkit

by Dylan Jones, Editor

Many organisations have some form of ETL (Extract-Transform-Load) tool in their business but how many are really exploiting their potential?

ETL products have been around for many years and have more than proven their worth for shunting data around the enterprise but it is surprising how many organisations have failed to recognise their potential as a data quality tool.

In this post we will present the case for using ETL tools to help get your data quality initiative off the ground.

We're not claiming they will be a direct replacement for the high-end data quality platforms but they might just give you an interim solution that will help you identify some quick wins that could pay for that more expensive data quality suite you've been struggling to get budget for.


Transforming your ETL tool into a data quality toolkit

In order to determine whether ETL tools can deliver a useful data quality function we need to come up with some form of benchmark. We need to classify the typical data quality tasks a business can undertake and assess whether a typical ETL tool can deliver against these needs.

To help us classify these needs we're going to use the "Data Quality Assessment" framework created by Arkady Maydanchik in the book of the same name.

Aside from being an essential read for any data quality practitioner, this book provides an exhaustive list of data quality rules that organisations should ideally be monitoring on a regular basis.

In short, this provides an excellent framework for our ETL benchmark.

After assessing each rule against the capabilities of known ETL tools we created a simple compliancy rating against each rule. Compliant, partial compliance, non-compliant. We discovered that nearly 70% of all the rules Arkady covers in his framework can be supported by ETL tools. 23% are partially compliant and only 7% are non-compliant.

So What Next?

1. Understand the underlying principles of data quality assessment

Clearly, building a data quality framework with ETL is feasible but you must build out from a robust data quality assessment framework. We recommend the framework developed by Arkady Maydanchik as there is nothing that we're aware of that comes close for depth and completeness.

If you want to learn more, here are some options:

  1. Read some of Arkady's tutorials here on Data Quality Pro
  2. Get the book on Amazon
  3. Complete the online Data Quality Assessment tutorial provided by Arkady at eLearningCurve

2. Trial data quality improvement on high payoff areas of the business

If your organisation has ETL licenses, spare development resources and a reasonable understanding of data quality rules management we can now begin to trial some data quality initiatives.

But where to start? The very place that ETL tools are typically deployed is a great place to go hunting.

I've lost count of how many horror stories we have heard on our sister site Data Migration Pro where teams ignore the need for sound data quality management and data discovery, opting to dive straight into ETL development.

Our advice is to use the ETL tools earmarked for the integration or migration development to instead discover, validate, improve and control the data first.

Already finished your data migration or data integration? Why not use your ETL tool to monitor ongoing data quality health in the new target environment?

There are literally hundreds of scenarios where you can deploy ETL tools as a data quality workhorse in any organisation. Monitoring data quality rules along an entire customer service fulfillment chain for example is a great place to start.

Some of these articles should help inspire you to action:

3. Focus on creating capital to re-invest into ongoing data quality improvements

At the start of this post we highlighted the fact that long-term, it really does pay to invest in a full-blown data quality platform. The total universe of data quality rules you can manage with specialist tools is clearly greater than those of ETL tools.

What we have prescribed here is a simple "stepping-stone" approach.

By leveraging ETL tools already in your organisation you can target the low-hanging fruit which can provide much needed cash for additional staff and specialist technology to help you expand your data quality initiatives.

Case in point: One telecommunications company in the UK did exactly what was prescribed in this post. They looked at some of their most business critical processes and identified several key processes that were experiencing data quality failures at different stages of the service chain.

They implemented an improvement programme using an ETL tool that they were familiar with and had spare licenses. They modelled data quality rules and business process rules across the full life cycle of expensive fibre provisioning and created a management reporting dashboard that highlighted anomalies.

The cost-savings were dramatic. Not only did the customer retention increase as they eliminated process defects, they slashed scrap and rework costs and increased the amount of spare equipment that would have previously been re-purchased at considerable cost. All of this created significant capital that could be re-invested in other improvement initiatives.