We are new to HCI development, we developed a sample OCR stage plugin and want to share with community.
This plugin has specific use case, it will process scanned image files and convert to text using tesseract (OCR) library. Metadata is used to find specific text/ fields within the document and attach it with HCI document meta data field. It can also be configured as per the user needs by providing regular expression.
Entire source code/ JAR and setup document can be downloaded from here (GitHub). Hope this helps team, feel free to share your suggestions.
Note: For better results use scanned images with 300 DPI.