PDF table ingestion and conversion to CSV

Question asked by Vinod Subramaniam Employee on Sep 27, 2017
Good morning


I'm starting work on a kettle plugin to ingest PDF tables and convert them to CSV as input to the TextInput plugin.

What is a good method to recognize tables in PDF format ?

1. Use markers and an API such as Apache PDFBox

2. Convert PDF to image and use image recognition algorithms.


Please share your experience.