Our text reading solutions implement custom processing steps around standard OCR engines to facilitate recognition for complex images.
In this figure you can see a typical workflow shared by all today’s off-the-shelf OCR systems:
A tried and true design, and it works pretty well for regular scanned documents.
When it comes to other image types such as street sign photos, this design proves to be inadequate.
See how the workflow looks like in a CustomOCR text reading solution, and how it’s different from a regular OCR system:
CustomOCR inserts additional processing steps before and after the call to the OCR engine, at the same time skipping all unnecessary OCR functions only relevant to scanned documents.
Steps before OCR add up to the “understanding” of the image. This stage results in a series of small images very much looking like a bunch of scanned mini-documents. For those images, the OCR system will work better, far better than for the original image.
Steps after OCR constitute a stage of improving OCR results and extracting data. This is done using application-specific knowledge and defined rules.
Read more about CustomOCR solutions’ key components:
As a core component for character recognition in our solutions, we had chosen Tesseract – an OCR system being developed by Google. It is one of the top performing OCR systems available in today’s market.