Many moons ago I was on a customer site when a strange thing occurred that I think is symptomatic of why many companies struggle with data quality.
A data quality manager asked me not to implement a particular data quality activity because it wasn’t supported by their current platform.
We had created a bespoke piece of software that performed a particular type of data profiling that wasn’t found in their mainstream data quality product. The bespoke code was being used as a validation service for an asset management system. Their objection was that "IT” couldn’t support it because:
- IT did not create it first (I hacked it together in PL/SQL)
- It didn’t use the corporate standard they had (which was essentially an enterprise data quality tool)
The business intervened, things ping-ponged and finally the code was accepted.
Business vs IT - Are we there yet?
Whether we still have this Business vs IT divide is questionable. In this article by Joe McKendrick he points to SOA as a vehicle for eliminating the barriers and contentions between Business and IT so that they become "as one”. I think a barrier between IT and "the Business” still exists in many companies but I think the SOA model and mindset definitely applies increasingly to Data Quality.
We’ve recently formed a fledgling think-tank called the "Data Quality and Data Migration in the Cloud Working Group” - ok, not the snappiest title but you get the idea. The aim of the group is to figure out what "cloud” and "web services” means to disciplines like Data Quality and Data Migration.
New solutions to old problems
One of the key areas I want to explore is how organisations can discover new ways to deploy data quality services without the dependency on archaic IT policies. I think Service Oriented Data Quality fits in well here.
Organisations should be able to adopt the most eligible Data Quality service for a particular function, regardless of whether it comes from vendor A or B. The reality is that some vendors do profiling incredibly well but suck at name and address matching. Others have masses of experience dealing with country address data but have zip all when it comes to product names.
So we’re increasingly seeing a "mix and match” capability emerging that has been accelerated by the advent of web services and cloud computing. This doesn’t fit the "one size IT solution fits all” model and nor should it.
Vendors have to re-think their business model and observe how some of the latest solution providers on Data Quality Pro are offering services.
Companies like Listpoint, Socium, Loqate, Experian Data Quality , Instant DQ and Uniserv for example now support a web enabled business model so I think the writing is on the wall, the days of the big on-premise sale may be numbered.
Benefits of "Service Oriented Data Quality”
One of the big benefits of Service Oriented Data Quality (particularly via web services) is re-use. If you come across a great data quality validation service then it’s no longer a chore to install that code across hundreds of different data entry points you can simply make that service available to whichever functions need to call it.
As Henrik Sørensen pointed out in 2009 in his article "Service Oriented Data Quality” this could help those validation routines downstream be adopted further upstream for root-cause prevention:
"I have seen lots of well cleansed data never making it back to the sources or only being partially updated in operational databases. And even then a great deal of the updated and cleansed data wasn’t maintained or prevented from there.”
Let’s welcome SMB’s to the fray
I think where this gets really interesting is for SMB’s who have previously been priced out of the market due to the traditional cost of enterprise data quality tools. As more vendors adopt a cloud based model they can start to weave some of these solutions into their "Service Oriented Data Quality” frameworks. This will make them far more competitive and agile (because that’s what data quality improvement delivers).
Rise of the data quality services marketplace
Marketplaces are a key element here and my hope is that sites like Data Quality Pro can showcase the various data quality "apps” becoming available so that companies can quickly compare features and of course try them out with minimal ease and investment. Our new Technology Briefing feature has started the ball rolling so watch this space.
In summary and beyond
So, I think that the future is bright for Service Oriented Data Quality and Data Quality via the web but what do you think? Is this pie in the sky or an absolute necessity? What are the benefits of Service Oriented Data Quality? Is this even the correct term?
What do you think? Please continue the debate below.