One of the single biggest causes of data defects in any organisation is poor quality information entered at the start of the information chain via data entry interfaces.
This article provides some simple, practical and cheap techniques to dramatically improve the data quality of human entered information.
Thanks to Henrik Sørensen for contributing to this feature.
Tips for Low-Cost Data Entry Data Quality Improvement
1. Take a minimalist approach
Keep it simple, do you really need all that data?
A lot of web forms or application forms are simply bundled as default but quite often a lot of the data entry fields are redundant. The problem is that the system may still ask the user to tab through them, frustrating the whole experience.
You should periodically review exactly which data is relevant, which is nice to have and which is obsolete. Design your forms with this kind of ranking in place and make it easy for users to complete only the data they absolutely need to get entered.
2. Profile the good and the bad
Jim Harris penned a great post recently talking about his experience ofliving in a zip code that doesn't exist.
When a customer or worker enters data into a form are you recording the valid data AND the previous data that failed? The defects are just as important as the accepted data.
By collecting both sets of information and profiling it we instantly create a picture of where our data entry process needs to improved. If you don't store that information you're missing out on potential new business so there is a clear commercial driver to do this.
Even on valid, accepted data, we will find scores of examples where data is incorrectly entered or witness to regular abuse. This information flows into downstream processes and creates extra work, costs and bad decisions.
You now have no excuses to begin profiling your data entry data and use this intelligence to design new data entry processes.
3. Design helpful forms
I recently tried to enter some personal information into a tax return form which failed to accept the data. The form returned an error at the top of the page, using the particularly helpful phrase:"Error in form, please correct". Of course I had no choice but to persist otherwise a hefty fine beckoned.
However, with business forms, if we feel helpless, then we walk away.
If a form fails to validate your information then it should be designed in such a way that it guides you through the correction process.
Also, why not add basic support and assistance:
- How many web forms do you see that don't provide a feedback or help option easily visible next to a data entry form?
- Do you have a visible call support number, email or Wiki page where your data entry workers can get help or advice?
These are just two simple ways to improve the user experience and quality of inbound data to your business.
4. Prevention is better than cure
Data cleansing downstream from the point of entry is costly, repetitive, time-consuming and error-prone. It can increase service lead-times and add unnecessary complexity to your information chains.
Look at the typical data cleansing functions you have implemented to cope with poor quality data in your business. This is typically automated, using software, or manual through data workers.
Identify some quick wins:
- If you are standardising country names wouldn't it be easier to create a drop-down list based on accurate reference data?
- If the part codes are entered in a myriad of weird and wonderful formats and require pattern recognition and cleanse to standardize them, wouldn't it be easier to enforce standards on entry?
Take for example the excellent open source DataCleaner product that we reviewed and created a tutorial for. This provides pattern analysis logic in a Java format which you can re-use in your applications.
Data Quality Pro members also get free ascii pattern analysers for Oracle, VB Script and SQL Server. Why not use the VB code at the interface layer or the database functions down in the database transaction layer to validate the data?
If you cannot prevent defects at source then ensure you have routines as close to the source as possible to trap defects before they flow into the business. Whenever you build an error-checking routine or clean-up process ensure that you have a feedback loop to the form designers so that logic can be implemented at source.
5. Gather user feedback and act on it
Here is a wild and crazy notion- why not ask users and customers what they think of the data entry process?
Several years ago I consulted on a project where two systems were increasingly suffering from poor data quality and becoming inconsistent as a result. It was clear that the data being entered by the field staff was a major cause of the issue. I followed an incredibly simple process of:
- Profiling the entered data (see point 2) to identify the defect hotspots
- Mining comments data which was found in the entered record (this found countless examples of frustrated field workers)
- Listening to field staff who had entered poor quality data
- Re-engineering forms and interfaces to meet their needs
Common sense I know but the client had already commenced a complex data cleansing process. It simply had never occurred to them that there was a simple reason behind the poor quality data flowing into their business. The forms were poorly designed, cumbersome, error prone and didn't suit the working patterns and habits of field workers.
Solving data quality means going to the root of the source and implementing preventative measures that increase the satisfaction of the user experience. By creating simple, helpful, intuitive and preventative controls at the data entry source we can dramatically cut costs and complexity throughout the business.
There are some additional resources below to give you some further information on how to improve your data entry processes.6. Have an error tolerant search
A common workflow when in-house personnel are entering new customers, suppliers, purchased products and other master data are, that first you search the database for a match. If the entity is not found, you create a new entity. When the search fails to find an actual match we have a classic and frequent cause for either introducing duplicates or challenge the real time checking.
An error tolerant search is able to find matches despite of spelling differences, alternative arranged words, various concatenations and many other challenges we face when searching for names, addresses and descriptions.
7. Verify or select data from external reference data
This is going further from the list reference data on country names and codes.
Say you are going to add a business entity in your customer table. Instead of typing name, address and other data you may plug in to a business directory and select the entity from there. You will have the following advantages:
- Less typing (less errors)
- Data from reliable (official) source
- Possibility of ongoing updates
As the term "low cost” is included in the title here I have some tips around the possible costs of such solutions:
- Prices on external reference data are decreasing. There are huge differences between countries here. Regularly check the market if ROI is turning positive in your markets and your financial scope.
- Solutions for searching and reference data integration may be rented thus making your wins more than paying for costs.