How to Create a Data Quality Centre of Excellence: Laura Sebastian-Coleman
In this interview, highly experienced Data Quality author and practitioner, Laura Sebastian-Coleman, provides a detailed account of her experiences in launching and growing a Data Quality Centre of Excellence (CoE) in a large organisation.
Laura has appeared before on Data Quality Pro and is the author of 'Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework'.
Dylan: It’s rare for companies to take such a progressive step by appointing a Data Quality Centre of Excellence (CoE). What was the rationale of management for creating it?
Laura: The CoE structure was put in place as a way of accelerating the overall strategy of the data governance program.
For many years, data quality thought leaders have recognized that improving the quality of data requires cultural change. This need is clear in Tom Redman’s exploration of the politics of data, as well as in IAIDQ recognition of change management as one of the knowledge domains for data quality practitioners.
This kind of change can be achieved through a CoE that can run pilot projects or proofs-of-concept that demonstrate to the organization the value of process change and the adoption of best practices. The CoE can also be directly involved with helping the organization adopt change.
The purpose of any CoE is to develop best practices and methodologies in its area of expertise.
The Data Quality CoE focuses on a set of core functions.
These include:
Assessing the condition (or quality) of data
Measuring and monitoring data quality in production systems
Managing data quality issues
Reporting on all of these
Identifying opportunities for improving how we create and manage data
To execute these functions, we also need expertise in our toolset. The CoE’s focus is on critical data elements. We work with the data stewards to identify critical data and establish the rules by which we can certify it.
I like to describe a CoE using an image of three concentric circles. The largest circle is the whole organization. The CoE is a small circle within the organization.
The CoE identifies best practices, tools, and processes. It tests them and recommends how to adopt them. In some cases, the CoE members may be directly responsible for the activities for a period, but to make them stick the CoE must ensure other parts of the organization adopt them. So a CoE will often have a program of training and support.
Data Quality Centre of Excellence
A third circle lies between the wider organization and the CoE. The third circle comprises those who adopt the best practices and expand their use within the organization. One goal of the CoE is to widen this middle circle so that the improvements the CoE advocates spread throughout the organization.
As importantly, with them the principles needed to create and maintain high-quality data will also spread. Over time, the CoE’s activities will change. Its evolution will depend on the rate at which the organization adopts change.
Dylan: How are you staffing the CoE?
Laura: Within the COE, we have several roles:
Data analysts
Technicians
Process analysts
Communications
Data analysts are driven by their interest in data itself. They need to be able to understand data patterns, risks associated with data, sources of data errors, and the like. They also need a solid knowledge of the data domains in which they work.
Technicians provide support for analysis through knowledge of the tools we use and their capabilities. The goal of the technicians is not just to ensure the tools can be used, but to help the organization get more value out of the toolset. Our team has pretty deep expertise in our data profiling tool. They have implemented several improvements in how we execute our data profiling.
Right now, I have a great team of data analysts and technicians. I want to develop our process analysis skill set. Process analysis is vital to finding the root causes of problems and to proposing improvements.
I have recently brought on a communications/reporting lead. This role is critical to the success of the CoE. The reporting lead is responsible for partnering with corporate communications to ensure that the wider organization knows what the CoE has accomplished and how the work had impacted the organization. At this early stage, the reporting function is serving a dual role of evaluating CoE processes. In order to report on our progress, we need to have clear measurable processes.
Dylan: How do you see the staffing model evolving as the CoE starts to spread its influence across the organisation?
Laura: We will need to shift from a focus on identifying and testing new practices to helping make those practices stick in the organization. This is the adoption part.
To get a high adoption rate for practices requires consulting skills, including the ability to train people, but, as importantly, the ability to realign rewards and incentives and persuade people to change their behavior when creating or using data or when developing data-driven applications.
So we will build out the team with people who can work with business and IT teams on implementing process improvements.
At the same time, since the data space is evolving, we will need to stay on top of evolving opportunities.
Dylan: Aside from Data Profiling tools - what are you hoping to have in your CoE toolset?
Laura: We have two other significant pieces of functionality.
We use the IBM Infosphere tool suite and are building out data lineage in it. Lineage provides a way to understand the data ecosystem and to identify potential risks. It is also invaluable when when data problems arise to be able to trace their lineage and identify where problems have entered a system.
In addition, our Quality Assurance area has adopted an automated testing tool that can do field-by-field comparisons at a fraction of the time it takes to do these manually. This kind of tool can help in the process of building quality into systems, because it provides a very comprehensive view of data.
Dylan: What does the roadmap look like for the CoE?
Laura: As I noted above, over time, the COE’s activities will change.
Right now, we are establishing and testing best practices with respect to data assessment, production monitoring, and issue management. We are shoring up our data profiling practice and ensuring we can report on the results in ways that are meaningful to our data consumers.
A year from now, I envision that we will be issuing regular, monthly reports on the quality of critical data elements, as well as on team activities. Two years from now, I think our data consumers will have significantly improved confidence in the quality of our data.
We are working with production support teams on best practices related to monitoring the quality of data in production systems. A year from now, we will have in place standards for minimum measurement requirements. In two years, I hope the standards will be widely implemented.
We are working in partnership with our data stewards to share findings and prioritize issues. By this time next year, I think we will have established a clear, dependable issue management process and will be able to demonstrate improvement in several nagging issues. In two years, I hope we will be able to show a reduction in the overall number of issues reported.
Dylan Jones: Finally, how will the success of the CoE be measured? What type of metrics are you planning to use to chart your progress?
Laura: Success is measured first by the build out of the program and its coverage. So we are measuring how much data we have profiled and what our findings are. From there, we will establish a data certification program that will measure the degree to which we have ensured the quality of critical data elements within critical systems. This program will also allow us to show how the quality of data improves over time.
Another set of measurements will be connected with the issue management process. We will measure the over number of issues, but more importantly, we will track the rate at which issues are closed and the impact of closing them.
The overall set of measurements should provide a level of assurance about the condition of data. In conjunction with our data stewardship process, it should also give our data consumers a way to provide feedback to the data quality program that can ensure that the program focuses on data that is most important to the enterprise.
About Laura Sebastian-Coleman
Laura Sebastian-Coleman, Data Quality and Data Standards Center of Excellence Lead at Cigna, has worked on data quality in large health care analytic data warehouses since 2003.
Cigna is a global health service company dedicated to helping people improve their health, well-being and sense of security. Laura has implemented data quality metrics and reporting, launched and facilitated data quality working groups, and contributed to data consumer training programs.
She has led efforts to establish data standards and to manage metadata for large analytic data warehouses. In 2009, she led a group of analysts at Optum in developing the original Data Quality Assessment Framework (DQAF) which is the basis for her book Measuring Data Quality for Ongoing Improvement (Morgan Kaufmann, 2013).
Link to book website: http://store.elsevier.com/product.jsp?locale=en_US&isbn=9780123970336