In this guest post William Sharp of The Data Quality Chronicle provides some practical advice for leveraging data modelling at the outset of a data quality activity.
In today’s world of increasing feature sets it is easy to become bedazzled by the latest instalment of new functionality. Data quality software is no different than other enterprise applications in this regard. However, perhaps more important than ever, it is critical that users of data quality softwareknowthe data they are analyzing in order to fully leverage the tool and truly deliver good quality analysis.
First things first
With all the advances in data quality software it is easy to become enamoured with the bells and whistles and thus lose sight of the fundamentals. As these rich feature sets enable the less technological oriented business users to participate in data quality exercises, the fundamentals of data analysis become more, well, fundamental.
One of my first questions when I start a new data quality project is, "Does anyone have a data model?”. Sometimes I get lucky and the DBA has one. Sometimes when I am really lucky the DBA also has a data dictionary. I don’t play lotteries due to my lack of being "really lucky”, if you know what I mean.
If I get a data model, I sit and examine it like a CSI agent does blood spatter. It is, after all, my roadmap to solving some mysteries. If there is not a data model available, I dig-in and create one. Most times I start out with pencil and paper (I know, how arcane!). Most data quality projects involve one or two main entities, the customer or a product. As such, I use this as my starting point.
For the sake of simplicity, let’s concentrate on a customer-focused data quality initiative. Commonly referred to as Customer Data Integration, or CDI, these projects involve the most critical person in any business; the customer! Customers are a complex animal, particularly from a data perspective. Organizations often focus on collecting as much data regarding customers as possible and rightfully so. As a result, there is usually a fair amount of data in, or related to, the customer entity.
What works for me is to start out with a big picture and narrow my focus with further analysis. My first picture step is to build what I call my "customer frame”. The customer frame consists of the customer entity and each entity to which it is related. In the figure below you can view some of my basic customer frame based on a typical instance of Microsoft Dynamics CRM (in pencil nonetheless).