Pentaho

 View Only

 I need to develop dynamic transformation where my source file keep change the columns counts

EN Ratnam's profile image
EN Ratnam posted 03-24-2022 02:36
I have a requirement where my source file will be a dynamic and it keeps changing the columns . Anyone come across this  kind of requirement please share the solution and if possible ktr file.

Suppose sample file:
id,name,loc,pin
100,abc,bang,560000
101,cdf,hyd,888888
102,ghj,bang,000111

second day same file will come as 
id,name,loc,pin,status
100,abc,bang,560000,y
101,cdf,hyd,888888,n
102,ghj,bang,000111,n

In this requirement I don't want to change my transformation and it is has pickup the columns headers dynamically. Same kind of mechanism and provision is available in  Informatic BDM.
Ana Gonzalez's profile image
Ana Gonzalez
So you'll need two transformations, the first transformation reads the file using the line end as the row delimiter, so you get the whole line as a column, and you keep only the first line:

Column1
id,name,loc,pin

Then you use the split field to rows operator, using the comma as the character to split the line, and you get the column names of the file you want to read:
id
name
loc
pin

You use this information to Inject metadata to a second transformation, in this second transformation you read the whole file, but you Inject the column names instead of providing them when you create the transformation.

There's an example on how to do this in your Pentaho installation folder: PDI_FOLDER/samples/transformations/meta-inject, this folder contains two transformations, use_metainject_step.ktr would be what your first transformation should achieve, but you only need the part about providing the column names, the second transformation, read_csv_file.ktr has steps without all the needed information available to run as is, because that information is injected when you run the first transformation.
Stephen Donovan's profile image
Stephen Donovan
Ana is correct about the use of Meta Data Injection. 

There are additonal steps, though they are EE only, that will read the metadata for you so that you do not need to parse the line and generate or store your own metadata.  Depending on how widespread this use case is, it may become worthwhile to have the functionality and support.