When a company is comparing OCR versus manual data entry as a solution for converting paper to electronic media, there are really only three factors to consider – accuracy, implementation difficulty, and cost. These factors might seem relatively easy to compare, but when you’re faced with a deluge of marketing pitches from OCR and data entry vendors touting the benefits of their various solutions, it tends to get a bit murky. As a buyer of these kinds of solutions and services, it’s on you to make the right decision – a poor decision can be disastrous for your company, and your career. We’re going to break down the pros and cons of the three prevailing methodologies here, and cue you in to the pitfalls to look out for as you conduct your evaluation.
OCR is “Optical Character Recognition,” a process whereby a document is scanned, and a software program reads the imaged document, trying to make out the individual characters, and returns the data from the document in an editable, digital format. On the surface, OCR doesn’t seem to be rocket surgery…but a commercial OCR solution can be incredibly complex. How does a software program tell the difference between a “1” and an “l”? A comma and a period? A “0” and a “O?” It all comes down to context, and to what a reader would expect to see in a given sample. If you’re expecting a phone number in a given field, for example, you would only expect to see numerals. If you program a piece of software to only expect numbers in a given field on a scanned form, it will be significantly more accurate than if you do not give it those “context clues.” Now, take a complicated form like an insurance claim for a hospital visit, with dozens upon dozens of fields, all expecting various different strings of data in different places, with numbers and alpha characters intermixed…it’s easy to see how many hundreds of rules must be built into an OCR system to accommodate those kinds of forms.
None of this is to imply that OCR is “bad” – OCR has come a long way, and many OCR vendors have invested hundreds of millions of dollars into their systems – some are now claiming as high as 98% accuracy, and even as high as 99% when combined with “vertext” processes (human text verification.) If you’ve ever tried to use a downloadable OCR application to read a page, you’ll immediately recognize that 98% accuracy is quite a feat.
What OCR is not (yet) is a replacement for human decision making in paper conversion. It is an assistive methodology, one that requires human intervention to resolve errors, inconsistencies, and illegible original data. Commercially available OCR cannot, as of yet, read handwriting with any reliable degree of accuracy. It cannot easily differentiate between a field label and the data in a field without extensive programming, and absolutely consistent text positioning within a field. (This is why many OCR forms are red-line forms – the field names and boxes are printed in red ink, and the form data in black ink, allowing the software to ignore red-colored input, and only read black-colored input.) This means photocopied forms are incredibly difficult for OCR programs to handle.
If you’ve ever had to copy data from a page to a word processing program, you’ll know that you’re bound to make some mistakes. Our brains tend to translate what we read into something we expect to have read, and our expectations are not always accurate. Not to mention our propensity towards typos. Even the best typists make mistakes. Manually entering data into a spreadsheet, database, or other electronic format is a process that can be riddled with errors if the methodology is not scientifically organized.
Many companies employ data entry personnel to enter data from forms into their information systems. To use the insurance industry as an example again, a data entry operator may need to type fifty or a hundred fields into a database when they are preparing to adjudicate a healthcare claim. If the methodology is to directly type data from the page into a system, errors will result. In-house manual data entry is NOT perfect.
There are methodologies, however, that can virtually eliminate errors. For example, Eagle uses a double-blind keying methodology with database assistance, whereby a document is divided into zones, and two separate individuals type the data from the fields in those zones into a database. (The zoning allows for a typist to focus on one type of data – numeric for example, or alpha data, reducing potential errors.) Those two entries are then electronically compared, against each other and against expected data (which may be based upon historical typing records, strictly controlled possible input values, or, for example, a database of mailing addresses.) When an discrepancy is found, the data is “kicked out” to a verifier, who must evaluate the typed data against the form and make a determination as to which version (or neither) is accurate, and make appropriate corrections. A maximum-accuracy approach like this is extremely labor intensive, but it’s as good as it gets – typically well over 99.9%.
Comparing the accuracy of OCR versus manual data entry, however, doesn’t just come down to how many 9s a vendor claims. It’s important when evaluating accuracy to make sure you’re comparing apples to apples.
When OCR vendors talk about accuracy, they are typically talking about character-level accuracy. Meaning, out of 100 characters, how many did their software convert accurately? A 98 percent rating means they have made two errors out of 100 characters, on average. That sounds pretty good, until you dig deeper. If a typical field contains, say, 6 characters, that means that there are 16 fields represented in those 100 characters. Two errors could mean that 2 out of 16 fields were converted inaccurately – that means upwards of 12% of all fields may contain an error.
Even with a 98% field-level accuracy, you’re still talking about a lot of errors. That would mean that out of 100 fields, two of them would contain an error. If your document contains 50 fields, then on average, every single document will contain an error. 25 fields – every other converted document has an error. And so forth.
As of yet, OCR can’t compare to a robust, double-key-and-verify manual data entry solution. One Eagle client conducted an audit of the accuracy of our P2E solution for healthcare claims, and found 6 errors out of 10,000 fields. That’s a 99.94% field-level accuracy rating. If the accuracy of the data you’re processing is important, OCR might not be the best choice for your organization.
Developing an In-House OCR Solution
If you are considering developing an in-house, high-volume solution for OCR, be prepared to spend anywhere from hundreds of thousands to several million dollars, and an implementation process of well over a year, and as many as three or more. You’ll need to modify your mailroom sorting process, evaluate and purchase scanning equipment, decide on a maintenance provider, train personnel, develop workflows for handling illegible or un-OCRable documents – and that’s all before the software even sees your documents. You’ll need to evaluate all of your inbound forms, and your IT department will need to program OCR solutions for each of them (and their variations.) You’ll need to select an OCR software provider and buy the appropriate licenses. You’ll need to develop an onboarding process for new documents, a quality control and verification team to handle OCR software output, and create integrations to your information systems. Bottom line, unless your volume is unfathomably high, you’re unlikely to have something up and running any time soon, nor see an ROI on an in-house solution for a very, very long time.
Outsourced OCR – image (scan) yourself, or outsource the mailroom?
That leaves outsourcing, and if you’re going to outsource, whether it’s to a data entry company or an OCR company, you need to decide first whether you’ll be scanning in-house, or outsourcing the entire mailroom operation. That would mean having your mail sent directly to the vendor, or, boxing up your documents and shipping them as they come in. There are considerable risks and costs associated with outsourcing the scanning of your documents, the biggest of which is what you’ll do if your vendor relationship goes sour. If things go badly, and you’ve outsourced everything, you’ll be in a tough spot trying to bring your documents back in-house. You can also assume that the cost of outsourcing your scanning is going to run roughly twice what it would cost you to handle it in-house (your vendor is going to want a margin.)
Many companies prefer to image their documents in-house, and send them electronically to their vendor for processing. This allows you the flexibility of determining how much volume you want to send to a single vendor, which documents you want to handle internally, and, most importantly, gives you control over the originals. Even if you’ll be shredding them, you may want the piece of mind of not letting your documents out-of-house until you know they’ve been imaged and indexed.
Now, remember that part above about what is involved in setting up an in-house OCR solution? All that work still needs to be done – only in an outsourced environment, it needs to be done by your vendor – and every single line item is going to be marked up, considerably. Some OCR vendors charge considerable setup fees for these services, particularly if there are non-standard forms being converted. Even with standard forms however, you’ll have your own business rules you’ll want to have applied to each document type, and each data element within each document. Your information systems may not be prepared to handle the data format your OCR vendor is returning to you, and custom programming will be needed during the onboarding process, whenever new documents are added, whenever forms change, and whenever your information system changes. Be sure to account for these potential costs when evaluating your solution, and don’t be overly optimistic – they can add up quickly and unexpectedly.
Outsourced Manual Data Entry
If you’re looking for the highest accuracy solution available, quick(er) implementation, and limited setup costs, outsourced manual data entry is the way to go – but do your homework. The vast majority of these kinds of firms utilize offshore labor (double-keying at US prices is flat out cost prohibitive,) and that’s fine – so long as their customer service and technical teams are onshore. We’ve all talked with Robert from Technical Services before while trying to get our internet back up – c’mon man, you’re name’s not Robert – and dealt with the frustration of language barriers, poorly connected internet VoIP calls, etc. The best outsourced data entry companies will appear, to you the customer, to be onshore operations, and all customer-facing interactions will be polished and streamlined. If you get the feeling during the buying process that you’re dealing with an offshore company, it will be many factors more difficult to work with them once you’re past the sales team, and working with the day-to-day folks. There’s also the matter of the time difference – if your customer service teams are on the opposite side of the globe, expect 12-36 hours delays every time you have an issue that needs resolution.
Things to look for:
– A company established and incorporated in the US
– A company with a long US legacy
– A willingness to meet in person
Things to avoid:
– Foreign-held entities
– Limited legacy doing business in the U.S.
– Delayed interactions during the buying process
None of this is to imply that a company headquartered in the Philippines or India won’t work hard for you – but the rubber meets the road after the sales process, and you’ll want to know that your technical folks and mailroom users will have the simplest time interacting with your vendor, in a timely and easy manner.
We’ve discussed several options here, and each comes with their own sets of benefits and risks. Here are the various options and their associated costs:
Developing an In-House OCR Solution
Setup cost: Highest (hundreds of thousands to millions of $)
Ongoing costs: Varies on complexity and quality control requirements
Main cost drivers: Software licensing, technical customization, hardware, onboarding new document types, quality control
Setup cost: Medium (usually in the tens of thousands)
Ongoing costs: Lowest on average
Main cost drivers: Per-document transaction fees, quality control, costs of data errors, customization of EDI and new document types
Outsourced Manual Data Entry:
Setup cost: Lowest (zero setup costs with Eagle, for example)
Ongoing costs: Comparable to outsourced OCR, per-transaction fees average 20% higher, customization lower (or with Eagle, free)
Main cost drivers: Per-document transaction fees
If it seems like we’re biased towards an outsourced manual data entry solution, well, we are, and for good reason. OCR technology, while improving, just isn’t quite there yet, and it doesn’t take much digging around to find horror stories of companies that have tried OCR solutions – both in-house and outsourced – only to lose millions of dollars due to poorly managed implementations, ongoing maintenance costs, and accuracy problems. Eagle Innovations’ solution for paper-to-electronic conversion costs nothing to setup, has no volume minimums, includes nearly unlimited data and business rules customization, unparalleled accuracy, and can be implemented in as little as seven days.
Yeah, we’re a little biased. Contact us today and let’s see if we can help you solve your data entry challenges.