How to Use Data Migration as a Springboard for Ongoing Data Quality Management
Data migration projects have a clear need for sound approaches to data quality management and governance but so often all the infrastructure created during the migration is wasted when the project ends.
Bryn Davies of Infoblueprint, a South African consultancy specialising in Data Quality, Governance and Data Migration services, provides a range of tips for helping project teams to "bed down" data quality and governance capabilities that are sustained long after the migration is completed.
Dylan Jones: Thanks for taking time out to talk Bryn. Let’s start by looking at some examples of recent data migration projects. Why were you folks called in, what was the challenge the company faced?
Bryn Davies: Typically what we have seen is that companies have burnt their fingers on recent application implementations due to having little or no focus on data quality. The project goes over time and budget mainly due to data quality issues in the target application and this of course negatively affects user and management acceptance of the new system from the get go.
Very often management has been mistakenly sold on the fact that a new application, besides all the other reasons for implementation, will also “sort out our data problems”.
Sadly, as many of your readers will no doubt testify, without a formal approach to data quality from early on in the project this clearly will not somehow magically happen.
So there appears to have been a growing recognition and acknowledgement of this and the need to budget for it and bring it in as a formal stream to the next project.
Of course, by building better controls and processes for data quality during the migration you're naturally going to create a stronger platform for extending out a data quality framework and governance structure post-migration.
Dylan Jones: Do you find that some Systems Integrators prefer to leave the data quality aspects of a data migration or transformation in the hands of the customer?
Bryn Davies: Yes that is typically the case – somewhere in their project charter they will exclude this as not their problem and it is pushed back on the customer. It is a risk to their project milestones and payment schedule so it’s safer to just leave it out.
Whilst the SI or their subcontractor will have to do the data mapping and transformation work, this typically only covers enough to make the data physically “fit” into the target models, regardless of whether it is rubbish or not. This is particularly the case for master data, and to a lesser extent reference data.
The problem of course is that most organisations don't truly understand their data and the issues that lie beneath plus they lack the appropriate skills, methodologies and software to manage the data quality activities.
The fact is that data in the target doesn't just need to work structurally, it also has to serve the business purposes of the new application, which often includes new capabilities that the users have been sold on.
If the source data quality has not been assessed as to whether it can properly support these new target functions, and then not corrected or improved during the migration, then it’s a recipe for disaster.
Dylan Jones: Based on your data migration experience, what are some of the pitfalls you see organisations making when they attempt to tackle data quality during a data migration project without the necessary skills and technology?
Bryn Davies: Without the experienced planning and foresight leading to appropriate support structures, sponsorship, change management and oversight, data quality and in particular data cleansing during a migration, is approached haphazardly and generally seriously lags the rest of the project, ultimately leading to project overruns and even failure.
Data migration is often last on the project agenda but in reality it should be addressed far earlier in the overall transformation project. The sooner source data quality is assessed through generic and business rule specific data profiling then the earlier the red flags will come up. This allows for proactive and effective management of data issues before they cause real headaches downstream in the design and build phase of the migration.
Of course technology plays an important enabling and productivity role and its importance cannot be overlooked. Without capable data profiling, data discovery and data quality software, all data engineering and associated project controls invariably get done manually in a mix of SQL, Excel or whatever mixed bag of tools are available.
This disjointed and outdated approach to data quality technology is always difficult to control and leads to even more errors, inconsistent data cleansing and standardisation and unreliable or non-existent matching and de-duping.
Whilst ETL tools are often readily present in a data migration, they generally fall short of the richer functionality found in proper data profiling, cleansing, fuzzy matching and data quality reporting technologies. These include cleansing and de-duping but there will always be a need for manual remediation, whether it is in the source or staging area or to cater for those issues that simply cannot be programmatically dealt with.
For manual remediation an effective data steward interface into the data quality tool is essential, as it guides a controlled and formally agreed workflow for manual resolution within the tool, instead of in yet another out of control spreadsheet.
Dylan Jones: Is there any advice for some approaches that can be taken during the migration that will make it easier to mature data quality and governance capabilities post-migration?
Bryn Davies: Firstly it is very important to campaign and sensitise end users as early as possible in the project planning phases about the pending data migration and associated data quality requirements. They also need to understand what roles they may be expected to play in helping to resolve data issues going forward.
The programme’s change management team is a very useful ally in this communication and on-boarding process. They should also be educated about what data migration means and why quality data is critical. Then during the project, manual data cleansing by end users or by a dedicated team must be carefully controlled by the data migration stream – this requires artefacts to, for example, show data cleansing progress and how identified data load risks are reducing as the cleanse proceeds.
In order to instil a permanent awareness in the users involved, project artefacts should be designed and published so that they can serve as the basis for on-going permanent data quality reporting and monitoring post go-live.
It is also helpful to introduce a fun element by, for example, branding the data quality programme and having prizes for individuals or teams who perform the best in the cleansing process.
Dylan Jones: I think that's a great idea Bryn, data migrations can be extremely stressful for project teams and you're often placing heavy demands on the business users so injecting a sense of fun and "we're all in this together" is a great idea.
Any other tips on ensuring data quality is sustained post migration?
Bryn Davies: Yes, well in one project we produced a user friendly “data quality handbook”. The purpose of this was to act as a guide which educated the business. It was adopted and enhanced post go-live to be used in the particular business area on an on-going basis.
Using these type of techniques consistently and visibly throughout the data migration leads naturally to the evolution of some of the core foundations of data governance such as data stewardship because for the first time individuals are being made accountable for data in a controlled and monitored environment.
As you're navigating through the data migration you're invariably uncovering these wonderful subject matter experts and data owners who are critical to future data quality success. By integrating them into the data quality framework of the data migration you stand a far greater chance of getting them involved post-migration.
On large programmes you invariably start to touch on the big issue of master data shared across the enterprise because your data migration analysis exercise, if you follow best-practice, forces you to work out the data lineage and provenance of your data. This exercise leads to a practical understanding of the need for the upper echelons of a data governance organisation.
Dylan Jones: What has been the reaction from senior management within your customer organisations? Do they see the value of leveraging data quality post-migration or is there any initial rejection or blockading?
Bryn Davies: Relevant management attends project progress sessions and steercos throughout the project so it is important to use these opportunities to market the data quality work taking place.
For example, the readily available data quality reporting and monitoring artefacts we create are often the first time managers see real clarity around their data quality metrics. Yes, there are often some nasty surprises in the data but once managers see the value of the resources we create then it's not difficult to get them onside.
Inevitably, management are also involved in some of the more important data decisions needed by the data teams and stewards as we work together to identify strategies for data quality remediation and so the education process is organic.
Progress meetings are also opportunities to proactively raise the most critical data issues from a business impact perspective, backed by their top of mind expectations of the new system. As long as these opportunities are leveraged and the communication occurs then the support will be there post migration.
Dylan Jones: What kind of artefacts are created as part of your data quality specific activities?
Bryn Davies: This varies but typically we create things like a data quality rule register, business glossaries, source-to-target mapping specifications, data quality remediation plans, monitoring reports and various data governance related artefacts.
Dylan Jones: Finally, in hindsight, after completing many migrations, are there any steps or measures you wish you could go back and introduce?
Any wrap-up lessons you can share for other companies who want to leverage data quality post-migration?
Bryn Davies Dylan, we've been doing these migrations for so long I should probably write a book but here are some takeaway tips that your readers should find useful:
- Start the data migration and data quality discussion as early as possible, preferably before the project even kicks off.
- Point out the expected shortcomings due to the SI’s almost certain exclusion of data quality remediation.
- Engage management on the business level early on, ensuring that there is an appreciation for the funding that will be required to do the data migration properly.
- Profile and measure the quality of your data within the source systems early during project planning, even if the scope is not yet clear, so that there is factual input to the planning and further phases, rather than the usual guesstimates and the all-time classic line: “it works in the source system so it should be OK”.
About Bryn Davies
Bryn is currently co-owner and Managing Director of InfoBlueprint, a company dedicated to providing leading professional services in the fields of Data Quality, Master Data Management (MDM), Data Migration and Data Governance.
He has authored a number of articles on the subject of data quality and data migration and presents regularly at industry events.