Pentaho

 View Only

 Optimal way to join multiple files and output without creating multiple transformations

  • Pentaho
  • Ctools
  • Pentaho
James Antony's profile image
James Antony posted 07-16-2019 15:10

I have a task where I need to join multiple files and create a specific output file(s).

At the moment I have created around 10 or so transformations to carry out this task(one for each output file).

To me this seems quite cumbersome (not to mention tedious), so I would like to know if there is a more optimal way to achieve this?

The joins that create these output files are all very similar, many consist of the same joins in fact.

Here's an example of two of the transformations I have created:

pastedimage_1

pastedimage_2

As you can see, some of joins are repeated; this is a common pattern for the remaining transformations that I intend to create, with the only differences being: the files to join, the selected fields for the output.

Any help on this is appreciated.


#Pentaho
#Ctools
Johan Hammink's profile image
Johan Hammink

All the steps are supported by Metadata Injection. 

James Antony's profile image
James Antony

Thank you for the suggestion, Johan.

I’ve spent some time trying to integrate the metadata Injection step into my solution (never used it before), but I’m having problems getting it to work.

My idea was to create a single template transformation that is responsible for joining all the source files (11 in total) – like the ones above, and a second transformation responsible for injecting the meta.

The problem I seem to be having is that I can only inject data for all the joins. When I try to inject data, for let’s say, 2 files (a single join), an error is thrown. It seems to want me to inject all files.

Here’s an example of my injection trans, I feel that my approach might be all wrong.

pastedimage_1

Any assistance with this is very much appreciated.

Brandon Jackson's profile image
Brandon Jackson

The metadata injection step is pretty unusual.  If you are injecting a step, like Text Output, that would like output field columns and other metadata sent in, you send multiple rows (one for each field in the output).  The other inputs streaming in will be singular values into the destination step.  Once populated enough to satisfy the injected ktrs' steps, PDI will run the injected transform.  This is important because it will not automatically loop.  You'll need something to loop this for you and send a complete set of step configuration into the injector.

A good way to develop and troubleshoot is to output the injected transform (which is an option on a tab in the step) and see what it did.  Be on the watch for things that typically miss Hitachi's Q/A, like setting checkboxes as constants in steps in the injection, but they don't show up in the injected transform.  It happens.  You get what you inspect not what you expect.  If you run into a bug, please report it to support with a ktr and reproduction steps to jira.pentaho.com.  If you have a support agreement please submit a support case.  Good luck.