AnsweredAssumed Answered

PDF table ingestion and conversion to CSV

Question asked by Vinod Subramaniam Employee on Sep 27, 2017
Latest reply on Nov 17, 2017 by Rafael Valenzuela

Good morning

 

I'm starting work on a kettle plugin to ingest PDF tables and convert them to CSV as input to the TextInput plugin.

What is a good method to recognize tables in PDF format ?

1. Use markers and an API such as Apache PDFBox

2. Convert PDF to image and use image recognition algorithms.

 

Please share your experience.

 

Thanks

Outcomes