Pentaho

 View Only

 Intention of Single Threader step in Pentaho PDI

Robert Walpole's profile image
Robert Walpole posted 07-27-2022 12:05
Hi all,
We have been experimenting with using the Single Threader step to call a sub-transformation in place of a Simple Mapping in Penatho PDI, thinking that the sub-transformation would run in a single thread, instead of it utilising a thread for each step that the sub-transformation contains.

It seems however that the Single Threader step is not a drop in replacement for a Simple Mapping. For starters it doesn't have anywhere to specify input and output fields and, more significantly for us, the output row metadata does not seem to acurately reflect the input row metadata. For example if the input row metadata contains a field which is not referenced in the single threaded sub-transformation then this field is not visible in the row metadata in the steps following the single threader step. I say not visible as if we have an existing step which works with said field then this still works as expected, so the data is still there on some level, but I am not able to see the field listed in the input or output fields of the step.
I can't say whether this is expected behaviour or I'm not sure if I am using the Single Threader correctly. Is it intended to work as a replacement for a sub-transformation mapper or is it more intended for testing and debugging? Having to select the Retrieval step in the dialogue suggests to me that you could use it for debugging as I don't understand the point of doing this otherwise. Surely you would normally want the sub-transformation to run to the end?
Many thanks in anticipation.
Rob
Carlos Lopez's profile image
Carlos Lopez
Robert
Do you have a sample that demonstrated the behavior you are experiencing; I would like to give this a try?
Robert Walpole's profile image
Robert Walpole

Hi Carlos,
Thanks for your reply!
Sure, I have an example which I attach.
There is a simple transformation that has a data grid of first_name and last_name fields and calls a sub-transformation which concatenates them into a full_name field via a single threader step. Before the single threader there is a an Add constants step which adds a constant called project_name which can be seen in the Input Fields of the single threader but not in the Output Fields. Despite this I can copy the project_name field in the Calculator step after the single threader but in the second Calculator step I cannot call up the project_name field, although I can see the project_copy field. I should mention that the first Calculator step had been created when the preceding step was a Simple Mapping call to the same sub-transformation, hence I was able to access the project_name field at that time.

Cheers

Rob