How do you transition from a quality management role into a data quality focused position?
In this interview Gabriel Mantilla of Wolman Group in South America provides answers to this and a range of additional questions that explore the tactics of applying quality principles to data.
How did you get started in a traditional quality management role?
Gabriel: Between 1990 and 1995 I was working at Mobil Corporation, where I met the TQM under Dr. Deming's approach, where I had the opportunity to apply the different strategies undertaken (Kaizen, SPC, ISO 9000, etc.).
When did you move from pure quality management to data quality management?
Gabriel: In 1996 I started a project (to date) focused on applying the concepts of TQM to the management of data, for which I have made the analogy of a lubricants factory (LF) with a data factory (DF), as is any organization.
In other words, I started as a challenge to manage data quality manufacturing using criteria similar to those used to produce Mobil 1, for example. The final product quality is the result of the quality of the different manufacturing processes.
The biggest difference in this analogy is that in LF manufacturing processes are well defined and perfectly predictable. In DF manufacturing is quite unpredictable.
Dylan Jones: For people starting on a transition from quality management to data quality management this unpredictability can be a concern so can you describe some ways in which a "Data Factory” as you describe are more unpredictable than a traditional manufacturing process?
Gabriel: In LF, the PDCA cycle is defined based on customer orders, production is planned (plan), purchased inputs (bases, additives, etc.) are mixed in a blending plant to become the final product (do), we obtain samples to compare them with specific patterns and to conclude whether or not the product meets the specifications (check), if met, the product goes to packaging, storage and finally delivered to customers and if not met, is generally placed in special containers marked with a large "X" and corrective actions are taken both in the product (recycle or discard) and in the process (act).
All process steps are clearly defined and perfectly predictable. The product is tangible and maintains its integrity even when not in use. The use degrades product quality.
In DF, data are intangible and highly volatile. The predictable production line can compare to transaction processing, where the quality of the data depends heavily on the reference and master data. To the extent that the organization expands its reach, the new valuesof valid master data become unpredictable; can be done in any process, at any time, by anyone with any standard, with the characteristic that the process should not be stopped, waiting to undergo a rigorous quality control.
Who implement the processes usually assume that they manage the quality of the data, but in reality it is not. The data are of good quality if they reflect reality and meet a set of complementary dimensions. But the accuracy of master data (the most important dimension of data integrity), inevitably deteriorates and is unpredictable. It is therefore essential to define a process for managing reference and master data that transcends the boundaries of the different business processes (operational, collaborative, analytical and management) of the organization, framed within a generic life cycle of data to provide data quality services through different processes. Here I recommend creating the "Universal MDM” to provide data quality services through cloud computing.
Unlike the example of the lubricant, the data are volatile (mixable, transformable, malleable and outdated). Nonconforming data must be marked with an "X" through the states of quality. The same data can be expressed in many ways. Depend largely on the will of the person who generates them. Data is a perishable resource; what was true yesterday is not today, and this will happen so inevitably and unpredictably. The only way to know if the data has been degraded is comparing with reality. The more you use the data, quality is maintained.
How have you constructed your quality principles for data quality management?
Gabriel: For this I have defined a process of data management that operates across the different processes of any type of organization, in which data governance, MDM and other concepts that are in vogue, have been very useful.
In these 15 years, enough water has flowed under the bridge, through many experiences in Latin American countries, that I would like to share deeper.
The transition was easy because I had very clear concepts of TQM and good experience working with data. The contextualization of TQM to the data I did it by creating a data quality model (I call it "Model Wolman"). I chose to go the hard way (self) for 2 reasons:
- There was no literature on the subject (at least in our environment was not very accessible) and
- I wanted to avoid bias to experience a model "Creole" because I perceived tendency towards a technology issue, but not the methodology or processes.
Is access to literature on data quality management still a problem in Latin America? Is this a translation issue or lack of conference events, training etc. What are the main issues do you think?
Gabriel: Currently access to the literature is obtained via the Internet. However, there are no training programs, the presence of international institutions and associations is scarce and the most important problem is there is no culture to data quality. This is growing very reactive through controlled sectors such as financial. The data quality is seen as an option. Actions are taken with short-term view. Companies usually have a manager of IT but very few have a manager of information. Therefore, data governance is scarce too.
In the less controlled sectors the need to manage data quality initiative comes from the executive leadership skills, which generally belong to the marketing area and in a few cases in the areas of IT.
For those we have launched into the adventure of evangelization, in general the process has been difficult and I think that the presence of international partnerships will be a good catalyst for change.
You’re obviously a strong believer in defining your own concepts for data quality management as opposed to following any one approach prescriptively, can you explain why?
Gabriel: While there are several authors and good literature on the subject, I think the first thing we must do to reduce the transition is to define each of our own concepts, to avoid confusion and to lengthen the process. In this sense, these debates is the most valuable for this purpose.
For example, last December I did a seminar in Colombia on governance and management of data and information. In preparation I wanted to present the basic definitions and found in the books of renowned authors about 10 different definitions of both "data" as "information" (mine was 11). All are appropriate but different.
I would say there are as many definitions as authors on the subject, which is worrying. I think the way forward is to continue to discuss what we think we know to draw conclusions and make simple definitions, starting with the basics and continuing to its applicability in different environments or countries.
What is the Model Wolman - can you expand on this?
Gabriel: The Model Wolman is the native and unpublished approach of the company I represent (Wolman Group - The information Quality Company) to make the management of data quality.
As I said before, we rely on the concepts of TQM of Deming, from the book "Out of the Crisis" (by the way, journalist Mary Walton in her book summarizes in a masterly way the 14 principles and 7 deadly diseases management). These concepts were contextualized to data.
Over time I have come to learn other quality models to compare and integrate them with our approach, such as ISO 9000 (1987, 1994, 2000 and 2008), Lean, Six Sigma, among others, which are very well-founded. The control of waste and reducing process variability has been some of the pillars of TQM.
In short, this is exactly Model Wolman. It is an approach that incorporates new concepts from different fields of knowledge. The Model Wolman did not exclude anything or anyone, but takes the best and contextualized holistically.
We’ve featured a number of Lean and Six Sigma techniques in the past on Data Quality Pro so I was excited to hear you include these, are there any specific Lean or Six Sigma tools that you rely on the most or have been the most effective?
Gabriel: We use several techniques. For example, SIPOC diagram for modeling processes. Brainstorming to improve processes. Capability indices/Process Capability to measure performance and facilitate comparison with other processes. Control charts to reduce variability. Cause & effect to analyze the consequences (preventive and reactive). Pareto to prioritize analysis of the causes. RACI matrix to control the development of activities. Regression and correlation to analyze how one variable affects another. Forecasting for projecting the patterns based on past behavior. Risk Management to detect, correct and prevent unintended consequences. Sampling to be kept a process under observation and control. Among others.
What is your current Sigma level and how did you get started moving up the Sigma levels?
Gabriel: Our first data quality algorithm was written to standardize addresses without standards. The initial efficiency was 3 sigma and now we are about to reach 7 sigma. Since then (1994), I have developed over 500 new algorithms (or rules) that apply to different fields in a database, such as countries, states, cities, phone numbers, identification numbers, names, corporate names, products and so on.
How do you manage those algorithms or rules? Are they in a repository or tool?
Gabriel: First thing, before developing the algorithms, was to create the data quality process, which I call "EQL" (Extract, Quality and Load). I made the analogy with ETL tools where I focused on the "Q" with the premise that "T" (transformation) does not add much to the quality. Subsequently developed data quality tool (embedded in the Q) I call "5 C's" (Configuration, Cleansing, Completeness, Consolidation and Confirmation). Subsequently I developed the set of rules for data quality for each "C".
Deming defined "process" as activities through which some inputs are transformed to produce a final result. The inputs were ranked in the 5 M's, where one of the "M" is Machine (tools). Therefore, my approach to data quality is not a tool but a process that incorporates the tool, and this is based on rules or algorithms that are reused for different projects or processes of data quality (migration, integration, data entry, etc.), framed within a life cycle of the data.
How do you go about creating your definitions?
Gabriel: In order to define our approach to make it easy to understand and promote the improvement of culture to quality of data, I relied on metaphors taken from the workings of nature and the universe. For example, for us "the data are for an organization like the blood is to the human body." We adopted similar approaches to the processes, information technology, business and in general, different elements of an enterprise architecture.
Once we defined the approach, we continue with definitions such as: data, information, quality, among others, that were easily understandable to facilitate communications both internally and with our customers. These were our "best practices", hence my concern mentioned above. Part of our approach was to define policies that break with the status quo in the different fields of knowledge.
Then I defined the dimensions of data quality, a life cycle of information with 4 generic process, a maturity model, a tool for data quality based on rules we call the 5 C's (the kidneys of the organization), an operational process for managing data quality and integration projects (EQL) and a process of governance and management of data focused on MDM and DWH / BI.
What dimensions of data quality management have you defined?
Gabriel: Our data quality dimensions are aligned to EQL process, some of which go beyond the data environment. For example, in a typical project of integration, the first refers to "data modeling dimension" of the current sources (homogeneous or heterogeneous). Many of the problems of data quality are caused by a poor data model, or deficiencies in the rigorous application of the rules of referential integrity. Then we apply the rules of data quality to each individual source for the "dimensions of validity, consistency and accuracy." Then we apply the rules of data quality to the "dimension of integration" by mixing the different data sources (internal and external). Finally, we apply rules in the "dimension of consolidation", where data from individual sources are moved and restructured into a new data model, such as a unique base of customers for the implementation of a CRM process.
Each of these dimensions has its sub domains and these in turn contain the rules for data quality.
You mentioned a life cycle of information with 4 generic processes - what are those?
Gabriel: The 4 processes of our ILM where data quality is applied, are:
- Entry, to bring real-world data to electronic media.
- Movement and restructuring, to transform data from multiple homogeneous or heterogeneous sources to a new and unique data model.
- Use, to "oxygenate" the data of different applications and processes and
- Feedback, to compare the data in electronic media with the real world
Gabriel: My own maturity model is based on the gradual progress of an organization with the implementation of data quality concept, starting from the simplest to the most complex. The simplest is the knowledge that each person has about something, trying to develop an individual model. The most complex is to incorporate gradually and orderly "the best practices" from the different areas of knowledge to the own model. The levels are:
- Initiation: When I am convinced that the Wolman model gives me benefits, and I pledge to apply it.
- Definition: When I conceptualize my own model (with best practice, which I already know).
- Application: When I use my own model in the practice (preferably, a single project of high-impact).
- Consolidation: When I review the lessons learned, make adjustments to the model and I extend to other areas or processes.
- Optimization: when I check "the best practices" from different knowledge areas, understand them and incorporate my own model, continuing the cycle.
This approach applies to any field of knowledge and all the fundamental concepts that govern an organization. For processes, we use the 5 M's. For data quality, we create the 5 C's. For risk management, we create the 5 R's. For innovation, we create 5 I's. For marketing, we create 5 P's, and so on. The goal is to create an organization "5 in everything" (5 were the highest score when I was student).
What are the 5 C’s, those kidneys of the organisation you refer to?
Gabriel: Continuing the metaphor of blood, if "processes are for an organization such as the organs are to the human body", then EQL (and 5 C's) are the kidneys of the organization. Therefore, I feel like the hematologist in organizations. I measure the health of the organization based on the quality indices of the data. At first, we use 5 C's like a service, but now we have developed a tool that is installed in the organization (or cloud) to be used by all data quality services covering the ILM. That's why we call 5 C’s the kidneys of the organization.
You mentioned that you’re starting to do some work around BPM and data quality - can you expand?
I recently started with BPM through the difficulty of incorporating data quality in processes (the lack of culture). The quality of the data should not be an option in the construction of the processes (BPM does not even mention). Last year I learned about ACM, which differs from BPM as it gets into the core data and, eureka! We put in the core work of nearly 20 years continuous unintentionally. I still do not get out of surprise. This allows us to manufacture a new generation of processes where data (and information) quality is the core. The construction of the new processes is faster and the operation is more efficient, better performing and easier to maintain. This is a new approach not only of software engineering but also of processes.
Forgive me, what do you mean by the acronym "ACM” - Adaptive Case Management? If so, how has this technique specifically helped you manufacture a new generation of processes with data quality as the core? Why is construction of new processes so much faster for example?
Gabriel: In fact, ACM is Adaptive Case Management. I was influenced by the book "Mastering the Unpredictable" by Keith D. Swenson when I read it last year, after having worked for several years with BPM. ACM is an approach I had been working for several years and the surprise was great when I identified fully with its concepts. The secret is to change the BPM approach, which has as its core work flows and around these data. With ACM, the core is the data and around these processes.
The story began more than 20 years, when I was consulting manager for Ernst & Young, where I met a famous professional named James Martin (one of the originators of Information Engineering). He did much emphasis on the importance of data in a software engineering project, whose approach illustrated with a 3-sided pyramid (data, processes and technologies). One of the aspects that he was concerned was how to incorporate the strategy and, therefore, business changes that impact the applications, to avoid generating high rates of maintenance.
Basically, Information Engineering was the software engineering approach of Dr. Martin. I started with the premise that the solution to his concern should focus on doing real "Engineering to Information”, which is, viewing the information (and data) with another perspective. I did not see it as a face of the pyramid but as the core of something that should be a new approach to enterprise architecture (data, processes, technologies and business). So my approach is illustrated with a molecule (or atom or a solar system) where the core lies the "information quality" around which orbit the other elements of enterprise architecture. At the core is the data, the Meta-data and the WRE (Wolman Rules Engine).
With this approach, we can construct a new process, designing the core and using WRE, so that changes will not impact the programs but the Meta-data, solving maintenance problems. Anything can change, then everything must be modeled in the core.
You mentioned the WRE - Wolman Rules Engine - can you expand?
Gabriel: Those who have had the experience of developing data quality rules, have noticed that the complexity is much greater than the development of business rules and applications typical of software engineering (ERP, CRM, ...). This led us to create the "WRE - Wolman Rules Engine", which is composed of 3 engines:
- Data quality rules. Are the more complex rules, and make our best experience.
- Business rules. Are the conditions of each business (process, application or service) that provide flexibility. Here belong all the parameters and conditions of business.
- Processing rules. Are the newest and more costly. Activities are "factored" in any organization.
The WRE is the main tool of our approach to make processes faster, more efficient and easier to maintain.
I can understand the need for processing rules but what is the difference in your view between a data quality rule and a business rule because there are many who would argue that they’re essentially the same?
Gabriel: A data quality rule focuses on detecting the level of quality of data (data profiling) or correct the data detected as unacceptable (data cleansing), qualifying and improving the level of data quality. A business rule is focused on implementing a specific policy on the development of an activity of a process, providing flexibility to a process when the organization decides to change policy. Some data quality rules use business rules, and some business rules also use data quality rules, but both types of rules must be conceptually different.
For example, for comparison of the name "GABRILE" obtained from a source (call center) with the name "GABRIEL", obtained from the universal MDM, the data quality rule is that both names match. The process rule could be that match is 100% or a lower value, corresponding to an acceptable threshold. Generally, the data quality rule does not change while the business rule may change depending on business policies, which depend on the level of risk accepted by the organization.
How do you deliver the "Wolman Rules Engine”? What type of technology or methodologies do you implement?
Gabriel: The methodology is software engineering applied to each "atomized” components of WRE. Processes operate in a web environment for both batch processes and online. The technologies currently used are:
- Windows 2003 Server and Linux, for operating systems.
- Oracle, to store data and Meta-data.
- PHP, for the presentation or user interfaces, business rules and processing rules.
- Visual Fox Pro, for data quality rules.
We are migrating rules of WRE to different information technologies such as engine databases, programming languages (like Java and Python) and operating systems, to expand the technological possibilities of the tool.
What we want is that when somebody asked ¿what technology is built your tool? We can answer What technology do you want to build your processes?
I have to say, it’s an absolute pleasure to find people like yourself who are so committed to innovation both in your own learning and adapting this to your employer. Where next for your education, what new skills are you looking to learn as it strikes me that you strive to improve not only your organisation but yourself as a practitioner?
Gabriel: I believe that systematic training in data quality could complement the formal processes of education, which would expand the expectations professionals in any field of knowledge.
Inevitably, we will pass from the era of information and knowledge to the age of wisdom. Therefore, I recommend reading the works and biographies of thinkers, with humility and respect, but critically. I think it is not enough to know who, what, how, when, where and why the concepts were created. It is very important to ask ourselves "what for?" of each sentence or concept.
What lessons or advice can you share for other members who have a quality management background and are looking to move into data quality management?
Gabriel: I recommend try to mix quality knowledge with concepts from each field of knowledge. Particularly I've been an admirer of the work of Deming (quality), Martin (data modeling), Drucker (philosophy applied to business), Einstein (imagination), Descartes (reason), Leonardo (art + science), Socrates (knowledge from the ignorance), among others contemporary. The further back I go, the more I find wisdom. Each concept should put our imagination to try to create our own models mixing all kinds of knowledge. The more heterogeneous, is much better.
Let me mention one phrase that influenced me and changed my life when I was studying high school, which I read in the book "The Discourse on Method" by Rene Descartes. "People should devote his life to cultivate reason and move increasingly in the knowledge of the truth." This was the basis of his philosophy of "methodical doubt" (I doubt, therefore I think; I think therefore I am), which for me is the basis for continuous improvement and innovation. This allowed me to create my own innovation policy, which states:
"The best practices today, tomorrow will be the bad practices. So let's try to imagine the future and invent it”.
You've shared so many insights, thanks for your time today Gabriel.
Contributor Bio - Gabriel Gómez Mantilla
Systems Engineer graduated from Universidad Industrial de Santander (UIS) of Colombia, with 28 years of experience in project management focused on managing data and information as a tangible and strategic resource for organizations.
He serves as General Manager of Wolman Group (The Information Quality Company), where he created the "Model Wolman”, focusing on managing the quality of the data and information based in Total Quality Management (TQM). He was Consulting Manager at Ernst & Young and Systems Manager for Latin America at Mobil Corporation. He was President of the Colombian Association of Systems Auditors.
He has participated in about 25 projects of Customer Data Integration (CDI) in the implementation of CRM solutions, as well as 25 other data migration projects, improving databases, and implementation of DWH / BI and MDM for major companies operating in Latin America.
He is an expert in Information Engineering (IE), Total Quality Management (TQM), Total Data and Information Quality Management (TDIQM), Master Data Management (MDM), Data Warehouse / Business Intelligence (DWH / BI) , Business Process Management (BPM) and Adaptive Case Management (ACM).