Pentaho

 View Only

 Can Pentaho read .pdf files as input ?

  • Pentaho
  • Kettle
  • Pentaho
  • Pentaho Data Integration PDI
Nilesh Purohit's profile image
Nilesh Purohit posted 09-17-2018 18:29

Hi Team,

I have a requirement to read a .pdf file which we cant convert into .txt or any other format due to insufficient privileges. Can Pentaho read .pdf files ? if yes then how can you please suggest ?


#Pentaho
#PentahoDataIntegrationPDI
#Kettle
Johan Hammink's profile image
Johan Hammink

There is a plugin "Load text from file" which according to the documentation is aible to read pdf files. I never used it so I don't know how it works. You can download the plugin in the marketplace

Alain Debecker's profile image
Alain Debecker

Yes you can, but I do not know of any pdf to txt converter out-of-the box.

You can:

In the last case, please publish your work.

Dan Keeley's profile image
Dan Keeley

Indeed it can! check out my blog:

Unstructured data, Apache Tika and Beer | Codeks Blog