Pentaho

 View Only

 How to make Insert/Update not ignore duplicate values from production table?

Anna Nguyen's profile image
Anna Nguyen posted 09-30-2023 07:08

I created a Pentaho transformation to parse JSON from production table and load data to a warehouse table. To avoid inserting new duplicates, I used Insert/Update in my final step and although it does work well, it also ignores the existing duplicates from production table. Due to specific use cases, I need to include existing duplicates from production table to warehouse table but not sure how can I do it, the only way I can think of is to change the Comparator from = to > in The Keys to lookups values box, and it does include existing duplicates on first run, but once I run the transformation second time, it will insert new duplicates not from production table but on its own, in other words, the Insert/Update step generates new duplicates if I change the Comparator to > and I want to avoid that, what I want is to load existing duplicates from production table to warehouse table but avoid having Insert/Update step to generate new duplicates. Is there any chance I can change my approach here? Thanks!

Anna Nguyen's profile image
Anna Nguyen

Nvm, I finally figured it out. Just assign unique identifiers to records in the table before running the transformation and then I was able to retain existing duplicates.