The Death of Zonal OCR
![]() |
Discuss this Article No responses yet, be the first... |
In my recent travels, I made my way to Cincinnati for training on an OCR product line that is new to our company and fairly new to the United States. Our sales team has had a lot of traction on this product, so we set out to become technical experts. I have worked with several OCR packages in the past, but none have been as exciting as this. It was totally innovative both in functionality and methodology. It is amazing that software can be so simple and logical at times. After the one week training course was done, I had an amazing revelation. Zonal OCR is dead. If you are using a Zonal OCR product, you may already be antiquated.
Traditional OCR methods
If you are using OCR as part of your document capture, you are probably drawing boxes around fields or barcodes to capture index values or maybe to capture one value and perform a database lookup that populates other values. This will probably work fine for some organizations, but there are limitations to traditional Zonal OCRing.
With a Zonal OCR package, you must create templates for every type of document that needs to be OCRed. If there is any variation between invoices from two different vendors, then two templates must be created. What if a vendor upgrades their invoicing software, or changes packages? What if they bought a cheaper stock paper to print on? This may result in a template change, so again we need to create another template. There can also be recognition problems if there is any variation in DPI between the template and the actual scanned documents.
New technology
Imagine there is software that was intelligent enough to read what was on a document, classify that document, extract index values without drawing any boxes or utilizing barcodes, and achieve all of this within 500 milliseconds. You have now stepped into the world of Free Form OCR.
With artificial intelligence, we can train software to read a set of documents like a human being. The software has now learned the traits of those documents to separate them by their type. Once it has sorted them, it can now begin to look at the document again to extract data. Using the same methods, it now looks more closely at the documents for keywords on the page and what values are associated with them. For example, if I were to look up an invoice by its number, I would first look for a keyword like ‘Invoice Number' or ‘Invoice No.'. I would then look to the right of or possibly below that keyword for the actual value of the invoice number. Had I only been trained to look one inch down from the top of the page, and then two inches from the right of the page, I might have missed the actual value for which I was looking.
The reason is, our software is intelligently trained to look at the document in this manner and not focus on a particular zone. I teach it to look at a document several ways based on text values on those documents. This will handle variations like the value being below a keyword, or to the right of it. We never teach it to zone in on an area, because we would have to create two separate templates to do that. Plus, if the keyword changes position on the page because of an overall design in the document itself, the value is still picked up because it is still near the keyword even though it resides on a new physical location. This scenario would require a redesign of a new template in a zonal OCR system.
Finally, This can be achieved with a 70% plus automation rate. That means if you scan one million documents, you will never have to touch seven-hundred thousand of them. Some people strive for 100% automation, but there is no such thing. There will always be a crease on a page or the ink was too light to pick up on the scan. These pages will be kicked out. The remaining 30% may only need to be reviewed due to low confidence OCR levels. Bottom line, I have not seen any other product that can achieve index extraction at this rate.
Final Thoughts
If you are considering an OCR product to automate the indexing of your scanned documents, do a search for ‘Free Form OCR'. Please send my condolences to all Zonal based products.

