Recordsforce Blog

Why OCR and Data Capture Software is NOT the Answer

Posted by Bill Becker on Sep 1, 2020 9:23:50 AM

AdobeStock_201191606-jpeg-1

For over 20 years companies have been selling different versions of OCR and data capture software promising to replace data entry for your employees. We have bought the products hoping that someone eventually would live up to their hype. It would be extremely valuable to a company like ours to be able to automate data capture tasks. But so far, the data capture service our company offers always requires more than just OCR and data capture software to be viable.  Here’s what we’ve learned from decades of failure attempting to use OCR capture software as the solution for data capture.

First, let’s be clear on what we’re talking about. OCR means Optical Character Recognition and is the software that can read an image, like a TIFF or a PDF and translate that image to actual letters and words like a text document.  The next generation of OCR was forms based capture, which combined OCR with templates of forms or profiles that could be used to capture data from highly structured documents. 

Next was intelligent data capture, which was a reaction to the shortcomings of form based capture. IDC is the combination of OCR and profile rules that use the OCR to identify key words and word combinations that often appear in proximity to the value you want. For instance, most invoices have the words “Invoice Number” or a variation of that above or the right of the actual invoice number. The instructions would tell the computer to read the OCR and look for any combination of words that the human operator has programmed into the field instructions. This can lead to lots of complex and sometimes competing rules that make managing the system more effort than doing the work. 

Today, companies are selling their artificial intelligence based data capture which combines OCR with machine learning and deep learning to create a more complex and nuanced set of instructions that exceed the capabilities of any of the former attempts. While modern AI based data capture is impressive, it still can not stand on its own as a solution to eliminating data capture as a business process.

Why are none of these solutions acceptable as a way to eliminate your data capture? First, let’s consider what accuracy you need for your capture efforts. Does your accounting system need to have the right payment amount 97% of the time, or 100%? We can tell you from 30 years of experience that the amount of clean-up required to get a batch of invoices from 96% to 99.5% accuracy can take just as long using the AI software as having a good data entry person key the data by hand. Still, even in 2020, we see that humans combined with the right systems can do amazing things. The reliance on OCR as an escape from human efforts has led to an all or nothing mentality, leaving a large gap in their solution matrix. If your software is going to commit any mistakes, the interface for mistake clean-up should be approached with the same level of aggressive speed and accuracy goals as you have for your automated work. 

Each time we find ourselves using a new technology, we get to the clean up process and it's as if the software makers assume you won’t be using the clean up system after a few months, so there’s no point in spending any development time there. The problem is, at 97% accuracy at the character level, you will have to review 100% of your invoices to get that up to 99.5% accuracy.  In doing that, you’ve spent as much time reviewing documents in their review and validation system as you would have just keying them from scratch, and you haven’t added the complexity and risk of a third party software system operating in your environment. Strangely, this never seems to be of concern to upper management, who see the system being used and never go back to see if you really reduced the time spent processing documents into the system.  

These systems generally are extremely complex, requiring weeks of training to become proficient. The training is expensive and has no cross over value to other systems in your office. If you have challenges or even find bugs, it's likely that you aren’t big enough to make the software maker change their software.  In our experience we have found completely unexpected limitations even in mature systems. Simple things like monetary values being limited to two decimal places when frequently in our industry money can go to six decimal places (ie: gas prices) 

What Recordsforce has found in 20 years of experimentation and development is that the newest technology still only gets us nearly there.  Once AI takes over the process, it can deliver results that are inaccurate and have no remedy. For instance, two very similar but different invoices may be too similar for the AI to distinguish, but different enough that no single profile will capture both accurately. In those cases, because as the user of the AI software you have no ability to reprogram its learning, it just continues to oscillate between alternative sets of mistakes, never arriving at the correct answer for both invoices. This, at scale, is a disaster.  The software is highly confident it has answered correctly, but can be completely wrong on multiple documents every day. The result is 100% review of indexing, which completely destroys the ROI on the system’s purchase. 

How do we deal with this? Recordsforce has developed systems that combined with our services render exactly what has been the broken promise of OCR and capture software (99.5% accuracy). To produce extremely accurate data and images for your workflow and processing needs without requiring any data entry or manual processing. Our services remedy the limitations without introducing complexity or risk to your business. Through our highly controlled and SSAE-18 SOC II compliant operations, your records are in better hands at our office than at yours.  Some technology might be fast, but not accurate, others might be accurate but require a lot of processing time, Recordsforce delivers both.