Data Quality Best-Practices for Data Migration, Featuring John Morris


One of the principal causes of data migration failure is a lack of effective data quality management across all aspects of the project.

To help you reverse the trend of project failure, Dylan Jones (editor of Data Quality Pro) and John Morris, (author of Practical Data Migration) got together to help share industry best practices around data quality within a data migration project.


data-migration-data-quality

Dylan Jones: Before we start, let’s recap on some of the best-practices we’ve so far covered to help people who may be new to Data Migration and the importance of Data Quality Management. 

John, could you provide a quick overview of your approach to Data Quality Rules as we’ve covered that area in some detail in previous sessions.

John Morris: Yes, I’ve just been reading Arkady Maydanchik’s Data Quality Rules approach in Data Quality Assessment and although his approach is similar there are differences so I’ll explain what I mean by Data Quality rules.

A Data Quality rule is both the process that checks on the quality of the data and the process that handles the Data Quality errors. It combines both a set of procedures / forms and deeply involves the business side of the project.

As this call will no doubt show, the route to success on a data migration is in involving the business completely. 

Data Quality rules are a way of measuring and enhancing your data to a point at which it is fit for migration.

Most of us will recognise that perfect Data Quality on a project is untenable but what you need to do is get it to a correct level of quality at a right point at a right time.

What’s important is that we have a ‘Zero Unplanned Defect’ migration. Defects can be acceptable, but they must be planned and tracked according to their prioritisation and impact on the target system.

Dylan Jones: How can we get more organisations to adopt best-practices in Data Migration?

John Morris: There are a number of sides to this issue.

Firstly, how do you or customers build a business case for a correct approach to data migration?

One of the best starting points is to gain evidence from reports such as the Bloor data migration report.

This survey shows 84% of data migration projects overrun time or budget. Looking at the amount of time a data migration project overruns and your typical weekly cost or burn rate.

If this is say $2 million a month you can quickly see what the impact will be if a project slips by something like 5 months. These figures are crucial for the business case and demonstrate

what will happen when best-practice is not followed. The impact of overrun will be different depending on your circumstance. I have one client who is a publisher who is using internal resources and is less concerned than say one of my other clients who has strict regulatory requirements to hit a target date.

In any event, any delay will result in a delay to the target system which increases the delay cost as the organisation is unable to realise the business benefits of the new system, such as decommissioning, new business services.

At Iergo we launched a free training package to allow particularly non-technical people to help people understand the different skills required to deliver data migration.

I don’t find it so much of a problem talking to most technical people who have been through the data migration process, most don’t want to go through it again! They realise how painful it can be if it goes wrong. These people have to gain the experience to create a compelling business case that initially focuses on delivering an early scoping exercise. 

The second thing is that if you are unable to get a budget to carry out the full project you must carry out a landscape analysis to identify legacy data issues prior to the migration project kick-off. 

Dylan, what are your thoughts on Landscape Analysis technique, as this is a term and discipline you originally conceived on data migrations?

Dylan Jones: Good question. I’ll briefly explain for the benefit of those who haven’t read your book or any of my articles on the subject.

In essence it’s about a “look before you leap” mentality to data quality on data migration projects.

I often speak to companies who say they expect their project to finish in “about 12 months time” simply because that’s what the project plan indicates, there is no actual evidence to support this figure or base it on any formal investigation. 

The whole aim of landscape analysis is to carry out a “migration simulation” well in advance of the actual migration. On most migrations, there are analysis activities that typically take 10-20% of the elapsed time with the rest of the time spent building the ETL and “nuts-and bolts” integration work. 

I prefer to spin that around and have the bulk of time spent analysing with a much simpler and faster integration process as a result. Landscape analysis requires standard tools that are common in the Data Quality industry, things like profiling and discovery tools frequently come into play. 

I typically like to implement landscape analysis before the project has really formally begun because one of the biggest problems in migrations is that no-one knows how long it will take or indeed whether it will actually work.

In one example, I had a past client who wanted to collapse 17 systems into one and they asked for a quote for the migration. Rather than progress straight into migration we carried out landscape analysis to simulate how the migration could proceed. This was a classic example of the business sponsor being pushed to proceed without realising the technical impacts of the quality and structure of the disparate systems. There is a danger for all concerned in proceeding without the knowledge gained during landscape analysis. If you’re an integrator, you could get sued for failing to deliver and if you’re the customer then the wider business transformation suffers greatly.

It’s important when putting your business case together to focus on not just the time and money benefits of doing data migration correctly but to also focus on the personal implications to sponsors.

I often use the acronym STRIFE:

  • Stress

  • Time

  • Risk

  • Income

  • Fear

  • Expense

You need to get over to the person(s) sponsoring the project that there is considerable personal impact as well as business impact from taking the decision to ignore best-practice and techniques like landscape analysis.

In terms of landscape analysis I personally recommend a Data Quality tool that possesses at least:

  • Cleansing

  • Matching

  • Profiling

  • Data Quality Rules Management

  • Dashboard monitoring

 A lot of people just use data profiling tools and I find that these tools often run out of capability to find the more complex Data Quality rules. Landscape analysis is about finding as many rules and defects in the shortest timeframe possible, by using cleansing and matching you can perform a more effective migration simulation. 

At the start of landscape analysis I pick out some of the key business objects such as customers, accounts, products etc. We then explore how easy it will be to migrate a subset of the overall business data as we have all the transformation functionality required with our cleansing and matching tools. There is no need at this stage to even purchase a migration tool, a Data Quality tool with the right features will suffice, we’re just looking for big issues at this stage. 

Going back to my previous story, using this technique we helped a customer realise their original migration plan was not actually feasible. The sponsor subsequently realised how beneficial it was for them to have this information. It gave them more options for planning and implementing a different approach. 

Landscape Analysis provides a rapid sweep across the migration landscape to identify what resources, budgets, tools and techniques will be required in the project. In my toolkit I am increasingly relying on online management portals to record information and create more collaborative working. 

Landscape Analysis is a gradual drilling-down analysis, involving the business continuously, leading neatly into John’s Data Quality rules process. 

John Morris: So basically, what we’re saying is that once we’ve scared them with some data, we can start to put together an argument for best-practices! 

There are two sides to best-practice, the first is in the technical side of tools and technology. Tools have come on considerably and I would not recommend any project commence a migration without the right tools. 

I think you need to look at an appropriate ETL or migration tool. There are now even bidirectional synchronisation tools that give even more migration options and of course you need to include a data quality management tool and as Dylan mentioned, project portals are increasingly important.

Also there are scheduling and migration controllers required to organise the actual migration delivery. 

On the migration project side you need to structure your migration project to align with the other projects. You need to master business engagement. I always tell people that even though you have gained budget and sign-off you still need to sell the importance of data migration to the wider business and aligning projects. 

This is a continuous process, I recently had to “re-sell” the data migration process on a recent project to various groups in the business so remember that it is a continuous process.

Dylan Jones: Just reviewing the member questions, someone asked what would be included in a Migration Readiness Assessment (MRA). I think we’ve touched on this with Landscape Analysis but with an MRA it’s also important to look at how your project will align and impact other projects as data migrations are never in isolation.  John do you have any thoughts on what to include in a MRA? 

John Morris: First, let’s split the challenge in two – the technical and the project side.

We’ve obviously discussed Landscape Analysis and this is critical for scoping the amount of work required. 

On the project side, when we do MRA’s at Iergo it really depends on where we’ve been brought in. The first thing we typically look at is the end-to-end migration design.

  • What format is it – big bang, agile or iterative?

  • Are they Waterfall or Prince II etc.?

  • How will this influence the end-to-end migration design?

There are training restrictions, geographical restrictions, windows of opportunity and various other factors that also need to be considered.

We also examine data source selection as there is a tendency, exacerbated by tools, to focus only on RDBMS type data stores. We ensure that the data migration approach also examines all data sources, even paper-based collateral as this can actually prove to be more resourceful and accurate than the more traditional systems.

 Another thing to focus on is business engagement. No-one understands the data better or have more impacts due to the migration than the people who use the data every day.

 You therefore need to ensure your business engagement or business readiness strategy mitigates things like:

  • Carrying across enough history

  • In-flight transactions

  • What audits people are going to need

  • Whether people will be happy with the user acceptance testing

  • Whether people will need more “sandpit” time to get familiar with the new application

 All those things can stop you from migrating because if people say they are not ready they gain the upper-hand.

 Business engagement is therefore crucial.

And of course there is the approach you will take for data quality management throughout the migration…

  • How are you going to fix defects?

  • What priority strategy will you adopt?

  • What does the authority and governance consist of?

At Iergo we prefer the data quality rules approach that we cover in “Practical Data Migration” but certainly you need to have considered how you are going to prioritise.

I was working with one client who has taking data out of a system into its clone and even taking data out of this system to its clone didn’t work because referential integrity rules. You really need to bring out the transitional business processes required during the migration transition phase.

That gives you a flavour of the things we look for. Typically you find some of them are done well, often they are missing. If we align this with landscape analysis it allows us to cost and scope the project, the likelihood of success and the timescales involved.

Dylan, can you add anything to this?

Data Quality Pro: It does depend on where you come into the project and where you do the readiness assessment, most companies go straight into the migration.

I think the MRA should be accompanied with a sample migration data assessment or migration simulation to validate some of the resource estimates already created. In a recent project a client had indicated in their plan that they would be doing 3 system migrations a month for 6 months and looking at the people and tools they possessed it wasclear it wasn’t going to work from a resources perspective, irrespective of data quality.

Another pitfall companies are falling into is where they come up with a migration strategy that does not fit the technicalities of the new target environment they are moving to. A good example is that a lot of the new COTS systems that are very popular at the moment have quite complex gateways and API’s. You have to pass data through various SOA interfaces, there are no traditional database connectors for rapid loads.

This ultimately means the load times per transaction are low and a lot of integrators assume they can use high performance ETL products to load massive volumes in a short window of opportunity and this is no longer an option which is where we are seeing the emergence of the real-time migration offerings with synchronisation etc.

This is where an MRA becomes useful as it looks at those future target systems we can assess whether a “big-bang” migration over a bank holiday is feasible. There are many environments that are 24×7 so there is no downtime and other options are required from the outset.

It’s also about risk which is essentially the factor of a problem arising and its impact to the project.

If we go back to the Bloor Data Migration Report we see that 84% of companies are failing to deliver on time and on budget.

Now that may be due to wrong strategy and inexperience but it could be that they simply don’t know how to scope and budget these projects so if they did it correctly, using an MRA at the outset youmight actually find these projects would come in on time and budget for the more accurate estimate.

It’s the way companies are forecasting that is at fault as well as the lack of best practices.

I’m seeing a few companies get a lot smarter about the way they forecast and estimate their projects.

I am seeing a lot of great best-practices coming through into Data Migration Pro and the Data Migration Matters events that you are doing and I do think awareness is growing so I’m positive it will improve.


About the Interviewee - John Morris

Johny Morris has 25 years experience in Information Technology spending the last 12 years consulting exclusively on Data Migration assignments for the likes of the BBC, British Telecommunications plc, and Barclays Bank.

His book “Practical Data Migration” is widely regarded as the leading Data Migration methodology – PDM. Johny co-founded the specialist data migration consultancy, Iergo Ltd and is on the expert panel of Data Migration Pro.

http://www.iergo.co.uk

Previous
Previous

How to Deliver Enterprise Data Quality Management: Jay Zaidi Interview

Next
Next

How to Create a Data Quality Scorecard by Arkady Maydanchik