Pentaho

 View Only

 How to put in "Output Table" process, 3 Excels with different number of rows

  • Pentaho
  • Kettle
  • Pentaho
  • Pentaho Data Integration PDI
Ana Maria's profile image
Ana Maria posted 05-13-2020 18:10

I'm new in PDI and I would like to create an output Table (my last step to finish ETL) from three different Excels (but with the same primary key), with different columns and different number of rows. The error I have from the system is it cannot do output table procces because of different number of rows.

 

I cannot find any solution in Internet... Thank you!


#Pentaho
#PentahoDataIntegrationPDI
#Kettle
Andrew Cave's profile image
Andrew Cave

Hi Ana

 

What you will need to do is to join the streams for the three Excel files using Merge Join steps (first 1 to 2 then join 1&2 to 3) and then output to the table

Ana Maria's profile image
Ana Maria

Hello Andrew,

 

Thank you for your answer, it works!!😁

But I have a question just to check. In "Merge Join" step I put "Full Outer" in union type. Is it ok? Because If I put "Inner" It is going to appear just the common rows, and I have different number of rows.

 

You save my life! Thank you again

Andrew Cave's profile image
Andrew Cave

FULL OUTER tries to match on the key. If it matches it brings the matching rows from both sources. If they don't, they will return one source and null values for the other. An INNER join only returns the rows from the two sources where the key column matches.

 

I assume that makes sense in your situation.

Ana Maria's profile image
Ana Maria

Fantastic, thank you so much! Can I ask you another thing please? It's related with output table steps.

 

I have one fact table and tables around it which are dimensions. The relation is I-N.

From Merger Join step I add Table output step, I select the table that I want to create ( the first one is fact table) in "target table" section. But in SQL buttom appears "Alter TABLE name of fact table modify name of each column", and I don´t think the systemas has to modify anything...

Furthermore, when I run everything, the final errors of this step is "Error batch inserting rows into table [name of fact table]" and the second error is "Unknown colum "primary key_1".

 

*This primary key_1 comes from when I merge Excel 1 with Excel 2, an the result of this union is primary_key, columns of first Excel, primary_key1, columns from second Excel.

 

I hope I explain myself clear, and thank you!!