Pentaho

 View Only

 Help regarding the compare and input data!

  • Pentaho
  • Kettle
  • Pentaho
  • Pentaho Data Integration PDI
Tejraj Devaraju's profile image
Tejraj Devaraju posted 06-25-2020 23:39

Dear Team,

 

I have a scenario where I need to import the csv file data of around 80,000 records which contains all the data with or without modifications to the fields. I need to compare this to the database table values before I import it. I have tried with database lookup but it quite slow. Kindly let me know if there is any better approach.

 

Regards,

Tej D


#Pentaho
#PentahoDataIntegrationPDI
#Kettle
Brandon Jackson's profile image
Brandon Jackson

One trick I have used is creating a SHA checksum on each row in the database that I wanted to compare against. Then do the same on your input files. If you precalculate the database field, you can even store the SHA and primary key of the destination field elsewhere. Comparing two SHA keys is very quick and you will not pound your database or alter any schema that your DBA would get upset about.

 

There are many approaches depending on what level of compare you need to perform.