I recently obtained a copy of "Managing Blind: A Data Quality and Data Governance Vade Mecum" written by Peter Benson who many of you will have no doubt learned from in past interviews here on Data Quality Pro.
Peter has a vast amount of experience in Data Quality and Data Governance and has helped the industry move on considerably with his leadership of the ISO 8000 Data Quality standard.
I recently caught up with Peter to ask about some of the topics raised in the book.
Dylan Jones: Thanks for taking time out to talk about your new book Peter. Did you have a target audience in mind when you were planning the book?
Peter Benson: Managers grappling with the challenges and opportunities of data quality and data governance projects.
Dylan Jones: You talk of meeting the creators of ISO 9000 and asking what they would do again to which they responded "avoid the requirement for an audit”. How did this impact the direction you took for ISO 8000?
Peter Benson: We had to find a solution that did not require an audit. It was not really that difficult. Computers by definition can read data so they should be able to assess the quality of the data. The key is being able to represent data requirements in a computer processable form. If both the data and the data requirements are computer processable then a computer can assess the quality of the data. It really is that simple.
Dylan Jones: In the book you talk of how a seemingly small data errors can have exponential impacts. Can you provide any other examples of this so that people can understand just how big these issues can become?
Peter Benson: A good example of an "impact” would be the first Mars Lander where an error in the communication of a unit of measure resulted in an incorrect firing sequence and the immediate loss of the $250 million invested in the program.
We are all suffering today from the impact of a lack of quality data in the financial markets. Calculating risk requires quality data without this data you are simply taking blind gambles this is what happened in the real estate derivative markets and it continues today in the sovereign debt market.
At a smaller scale simply booking a hotel room or a flight or a car requires quality data, without it you will pay more for less.
In almost any company you can find substantial and immediate savings by simply identifying duplicates in the material master or alternatively by improving the descriptions in the material master so the negotiated prices are actually used instead of free text or off contract purchases.
The real problem is that when you calculate these potential savings they are so large that they are either unbelievable or they are believable and someone may get fired.
Cost saving is not very sexy, what excites a company is sales and growth but here too quality data can make a real difference very quickly in better customer profiling, more accurate billing and faster delivery.
Finally if you are looking for compounding data errors look no further that the risks associated with providing regulators with incomplete or incorrect data - a single small error can bring all the data into question and launch an audit.
Dylan Jones: You talk about how something as common as a DUNS number had no definition. Can you describe the challenges you faced trying to resolve that problem as I suspect it’s a problem many companies will face trying to assign their own identifiers.
Peter Benson: I call this the "everyone knows” syndrome. The lack of written definitions for data elements is one of the primary causes of data problems and it is the easiest to fix.
When you start to write the definition you rapidly see where the problems are. In the case of the DUNS number we knew it was intended as a business identifier but what was the definition of a business?
The strict interpretation is a legal trading entity as you must have a legal status to enter into a contract so yes a sole trader is a business. But what about the location? Are each separate locations businesses in their own right? And what happens when a business buys a business and what do you mean by "buy”?
As any data modeler knows the devil is in the detail. What exactly does the identifier represent and more importantly does the actual data comply with the definition?
In the case of the DUNS number it was and I expect it still is a challenge as there were businesses with a DUNS number with several locations without DUNS numbers and locations with DUNS numbers that to a business with a different DUNS number.
The agreed definition was "A legal trading entity at a physical location”, this is the same definition used by the Federal Central Contractor Registry. The trouble is the data is not always consistent with the definition. The ECCMA Organization Registry (eGOR) is an effort to address the issues found in both the DUNS and CCR databases.
In the end identifiers are references to a master data record so it should be possible to derive the definition of the identifier from the the mandatory elements in the master data record.
Dylan Jones: Can you talk about a recent ISO 8000 success story to help demonstrate the impact this standard is having on the industry?
Peter Benson: I think the biggest success of ISO 8000 is the increased awareness that data quality is the degree to which data meets requirements and without a data requirement you can not measure data quality.
Once companies have realised the importance of documenting their requirements for data they immediately realise that they need to create and manage a corporate dictionary. I hear of success stories almost daily and they are always the same. Something as simple as a corporate dictionary is improving the speed and accuracy of internal communication at all levels followed by better communications with suppliers and customers.
In terms of hard cash big success has come from plant expansion projects where planning for quality data has had a dramatic impact on the hand over costs and the speed with which an ERP system is up and running.
Beyond the natural resource and manufacturing industries we are seeing amazing successes in the banking and insurance industries as they get a better grasp of risk.
Finally we are seeing some successful applications in the healthcare market, something I hope we will all benefit from.
Dylan Jones: At a high level, what are the basic steps companies must follow to adhere to ISO 8000 standards?
Peter Benson: Create your corporate dictionary and document your requirements for data, these are easy first steps and they allow you to measure the quality of your data.
Improving your data quality takes time and requires effort so it is critical to understand the benefits. The answer lies in the data requirements, they will tell you what data you need to support what business functions, this is the value of your data. Data that has no use has no value.
Dylan Jones: What are the mechanics of creating the corporate dictionary? What goes into it and what type of tools do companies use to store information in the dictionary?
Peter Benson: ISO 22745-10 includes the data model for an open technical dictionary but you can start with something as simple as a spreadsheet, a simple database or even a text file.
You need to include a concept identifier and associate this with any terms, definitions or images that explain the concept. One term and a definition should be mandatory but you may want to add synonyms and abbreviations.
Of course you can also build your corporate dictionary as a subset of the eOTD which simply means you are using the eOTD concept identifiers and you are using the terminology that is already in the eOTD. Above all else start small and make sure you socialise your dictionary amongst your colleagues.
Build your dictionary over time and keep focused on the data elements you really use. You may also want to consider something like the ECCMA Corporate Dictionary Manager (eCDM) an open source application that is linked to the ECCMA Open Technical Dictionary (eOTD) and the ECCMA Data Requirement Registry (eDRR).
Associate members ($350 per year) are given a single user name and password and Full members ($5,000 per year) have unlimited corporate access and ECCMA also provides assistance in building their dictionary and data requirements which are loaded into the eCDM.
Dylan Jones: Finally, one of the most interesting discussions you raise in the book is the problem many companies have with "intrinsic vs extrinsic” characteristics of data quality, can you explain the difference as I see a lot of companies simply settle for simple measures provided by data quality software for example.
Peter Benson: The difference between intrinsic and extrinsic is best explained by the physical example of weight and mass.
Mass is an intrinsic characteristic of an item it is independent of external forces such as gravity and that is why an object will have the same mass on earth as it does in space even if it is weightless. Weight is an extrinsic characteristic it is the result of the combination of the intrinsic characteristic mass with an external force gravity so it varies with the force of gravity.
We apply the same concept to data. Intrinsic data quality is the quality of the data without reference to external requirements. Examples of intrinsic data quality would be missing data or badly formatted data. Extrinsic data quality is determined for example by comparing the data to a data requirement. ISO 8000 is focused on extrinsic data quality, does the data comply to an "externally defined” syntax, is the content semantically encoded and does the data meet defined requirements.
Most commercial software applications that measure data quality only measure the intrinsic quality of the data, they do not measure how "useful” the data is, this is a measure of the extrinsic data quality.
Contributor Bio - Peter Benson
Peter Benson is the Executive Director of the Electronic Commerce Code Management Association (ECCMA).
Peter served as the chair of the US Standards Committee responsible for the development and maintenance of EDI standard for product data; he was responsible for the design, development and global promotion of the UNSPSC as an internationally recognized commodity classification and for the design of the eOTD, an internationally recognized open technical dictionary based on the NATO codification system.
Peter is the project leader for ISO 8000, the new international standard for data quality, and for ISO 22745, the new standard for open technical dictionaries and the exchange of characteristic data.
Peter is an expert in distributed information systems, content encoding, master data management and the generation of high quality descriptions that are the heart of today’s ERP applications and the next generation of high speed and high relevance internet searches.