General Discussion

 View Only
Expand all | Collapse all

Run the same Job simultaneously with different parameters in Pentaho PDI

This thread has been viewed 44 times
  • 1.  Run the same Job simultaneously with different parameters in Pentaho PDI

    Posted 12-06-2022 08:08

    I developed a Job that receives the Customer Id as a parameter and loads the connection data from the database, so as not to have to duplicate codes and have a greater effort in maintenance.

    However, when executing the Job simultaneously through a main Job, both the parameters and the variables with the scope to be of the current Job, keep the value of the last Job started.

    If you trigger the Job through Kitchen via command lines, it works perfectly, isolating the variables. However, it is extremely slow and consumes much more resource from the machine, due to having several instances of Pentaho running, even limiting the JVM memory usage to 2GB, being an 8GB machine with 4 vCPUs. I tested with 3 jobs running simultaneously.

    I would like to know if anyone has had a similar problem and how they resolved it?



    ------------------------------
    Stevens Silva
    Chief Technology Officer
    Solcast
    ------------------------------


  • 2.  RE: Run the same Job simultaneously with different parameters in Pentaho PDI

    Posted 12-07-2022 11:05

    It would work in Kitchen because each execution is a new JVM.  It sounds like the variable scope that you are setting is at the JVM level.  You will want to change that to set the proper context to the job itself (which will pass down) or the parant job if you are setting this in a child transformation or job.  Avoid JVM level in almost every case.

    If you think you have done this properly include a test transformation/job for us to review.



    ------------------------------
    Stephen Donovan
    Digital Solutions Architect
    Hitachi Vantara
    ------------------------------



  • 3.  RE: Run the same Job simultaneously with different parameters in Pentaho PDI

    Posted 12-07-2022 11:32
    Thanks Stephen.
    
    I had already tried the scope, but it seems that he was not respecting it. I did a new test with a sub Job receiving as parameters and for each client I initialize all the variables that this sub Job will receive as a parameter with the scope of the current Job. There are still some problems that variables end up being shared between running jobs, as if he lost the reference and did not isolate them.

    Example Job Customer

    Generic Sub Job



    ------------------------------
    Stevens Silva
    Chief Technology Officer
    Solcast
    ------------------------------



  • 4.  RE: Run the same Job simultaneously with different parameters in Pentaho PDI

    Posted 12-07-2022 23:25
    If you could include a simplified version that shows the variables in scope (a simple Write to log) of the subtransforms along with your settings the community can try to test this in various versions/environments and see if we can reproduce or fix what you are seeing.

    ------------------------------
    Stephen Donovan
    Digital Solutions Architect
    Hitachi Vantara
    ------------------------------



  • 5.  RE: Run the same Job simultaneously with different parameters in Pentaho PDI

    Posted 12-12-2022 07:37
      |   view attached
    I created a project as a repository of example files on how I create variables and pass them as parameters and also the defined scope. The project is much simpler, but it gives a good example of the problem, that the variables are with the value of the last execution when starting by the main job "start"

    ------------------------------
    Stevens Silva
    Chief Technology Officer
    Solcast
    ------------------------------

    Attachment(s)

    zip
    teste.zip   18 KB 1 version


  • 6.  RE: Run the same Job simultaneously with different parameters in Pentaho PDI

    Posted 12-12-2022 17:39

    Not sure it helps but when I ran (altered your write to log a bit) in version 9.3.  I don't see the variable being set at all.  I think that this is due to the customerId being a parameter with the default empty value.  You may be crossing variable and parameter context.  However, I am not seeing it set to the wrong value, as you stated, but not set at all.

    Line 68: 2022/12/12 17:36:58 - wtl-customers.0 - customerId =
    Line 83: 2022/12/12 17:36:58 - wtl-customers.0 - customerId =
    Line 98: 2022/12/12 17:36:58 - wtl-customers.0 - customerId =
    Line 118: 2022/12/12 17:36:58 - wtl-companys.0 - customerId =
    Line 133: 2022/12/12 17:36:58 - wtl-companys.0 - customerId =
    Line 148: 2022/12/12 17:36:58 - wtl-customers.0 - customerId =
    Line 163: 2022/12/12 17:36:58 - wtl-sales.0 - customerId =
    Line 188: 2022/12/12 17:36:58 - wtl-customers.0 - customerId =
    Line 203: 2022/12/12 17:36:58 - wtl-companys.0 - customerId =
    Line 218: 2022/12/12 17:36:58 - wtl-companys.0 - customerId =
    Line 233: 2022/12/12 17:36:58 - wtl-sales.0 - customerId =
    Line 258: 2022/12/12 17:36:58 - wtl-sales.0 - customerId =
    Line 283: 2022/12/12 17:36:58 - wtl-companys.0 - customerId =
    Line 298: 2022/12/12 17:36:58 - wtl-sales.0 - customerId =
    Line 323: 2022/12/12 17:36:58 - wtl-sales.0 - customerId =

    In your etl_customer job where you set the variable into the current job (first image), why do you do this instead of simply passing the parameter down to the Process Tables job (where is declared as a parameter (second image)?   I understand that this is a simpified version and maybe you removed context that would require this as a variable setting locally instead of the parameter.


    When I do this I get the 5 customers Ids uniquely for each sub transform (customers, companys and sales).

    Line 68: 2022/12/12 17:20:52 - wtl-customers.0 - customerId = 4
    Line 83: 2022/12/12 17:20:52 - wtl-customers.0 - customerId = 3
    Line 98: 2022/12/12 17:20:52 - wtl-customers.0 - customerId = 5
    Line 118: 2022/12/12 17:20:52 - wtl-companys.0 - customerId = 3
    Line 133: 2022/12/12 17:20:52 - wtl-companys.0 - customerId = 4
    Line 148: 2022/12/12 17:20:52 - wtl-customers.0 - customerId = 1
    Line 163: 2022/12/12 17:20:52 - wtl-sales.0 - customerId = 3
    Line 188: 2022/12/12 17:20:52 - wtl-customers.0 - customerId = 2
    Line 203: 2022/12/12 17:20:52 - wtl-companys.0 - customerId = 5
    Line 218: 2022/12/12 17:20:52 - wtl-companys.0 - customerId = 1
    Line 233: 2022/12/12 17:20:52 - wtl-sales.0 - customerId = 4
    Line 258: 2022/12/12 17:20:52 - wtl-sales.0 - customerId = 5
    Line 283: 2022/12/12 17:20:52 - wtl-companys.0 - customerId = 2
    Line 296: 2022/12/12 17:20:52 - wtl-sales.0 - customerId = 1
    Line 323: 2022/12/12 17:20:52 - wtl-sales.0 - customerId = 2



    ------------------------------
    Stephen Donovan
    Digital Solutions Architect
    Hitachi Vantara
    ------------------------------



  • 7.  RE: Run the same Job simultaneously with different parameters in Pentaho PDI

    Posted 12-13-2022 13:31
      |   view attached
    I'm sorry Stephen, I revised the project better and now I was able to better simulate the problem, in it the main Job is "start". I used Wait for to simulate the parameter passing problem, both it and the variables end up with the last value set.

    ------------------------------
    Stevens Silva
    Chief Technology Officer
    Solcast
    ------------------------------

    Attachment(s)

    zip
    teste_v2.zip   9 KB 1 version


  • 8.  RE: Run the same Job simultaneously with different parameters in Pentaho PDI

    Posted 12-13-2022 14:01

    If you could include your log and explain where the error is that might help.   Can you also let me know which version you are running? 

    Here is my log but not sure I see a problem since the last one after sleep was Customer 2 and seems to show that it is in place.  But I could be reading this wrong.

    2022/12/13 13:54:07 - start - Start of job execution
    2022/12/13 13:54:07 - start - Starting entry [ETL - Customer 1]
    2022/12/13 13:54:07 - start - Launched job entry [ETL - Customer 1] in parallel.
    2022/12/13 13:54:07 - start - Starting entry [ETL - Customer 3]
    2022/12/13 13:54:07 - start - Launched job entry [ETL - Customer 3] in parallel.
    2022/12/13 13:54:07 - start - Starting entry [ETL - Customer 4]
    2022/12/13 13:54:07 - start - Launched job entry [ETL - Customer 4] in parallel.
    2022/12/13 13:54:07 - start - Starting entry [ETL - Customer 5]
    2022/12/13 13:54:07 - start - Launched job entry [ETL - Customer 5] in parallel.
    2022/12/13 13:54:07 - start - Starting entry [1 sec]
    2022/12/13 13:54:07 - start - Launched job entry [1 sec] in parallel.
    2022/12/13 13:54:07 - ETL - Customer 1 - Using run configuration [Pentaho local]
    2022/12/13 13:54:07 - etl_customer - Starting entry [Log parameter]
    2022/12/13 13:54:07 - - customerId: 1
    2022/12/13 13:54:07 - etl_customer - Starting entry [Proccess Tables]
    2022/12/13 13:54:07 - ETL - Customer 5 - Using run configuration [Pentaho local]
    2022/12/13 13:54:07 - etl_customer - Starting entry [Log parameter]
    2022/12/13 13:54:07 - - customerId: 5
    2022/12/13 13:54:07 - etl_customer - Starting entry [Proccess Tables]
    2022/12/13 13:54:07 - ETL - Customer 4 - Using run configuration [Pentaho local]
    2022/12/13 13:54:07 - etl_customer - Starting entry [Log parameter]
    2022/12/13 13:54:07 - - customerId: 4
    2022/12/13 13:54:07 - etl_customer - Starting entry [Proccess Tables]
    2022/12/13 13:54:07 - ETL - Customer 3 - Using run configuration [Pentaho local]
    2022/12/13 13:54:07 - etl_customer - Starting entry [Log parameter]
    2022/12/13 13:54:07 - - customerId: 3
    2022/12/13 13:54:07 - etl_customer - Starting entry [Proccess Tables]
    2022/12/13 13:54:07 - Proccess Tables - Using run configuration [Pentaho local]
    2022/12/13 13:54:07 - proccess_tables - Starting entry [Log parameter]
    2022/12/13 13:54:07 - - customerId Param: 4
    2022/12/13 13:54:07 - proccess_tables - Starting entry [3 sec]
    2022/12/13 13:54:07 - Proccess Tables - Using run configuration [Pentaho local]
    2022/12/13 13:54:07 - proccess_tables - Starting entry [Log parameter]
    2022/12/13 13:54:07 - - customerId Param: 5
    2022/12/13 13:54:07 - proccess_tables - Starting entry [3 sec]
    2022/12/13 13:54:07 - Proccess Tables - Using run configuration [Pentaho local]
    2022/12/13 13:54:07 - proccess_tables - Starting entry [Log parameter]
    2022/12/13 13:54:07 - - customerId Param: 1
    2022/12/13 13:54:07 - proccess_tables - Starting entry [3 sec]
    2022/12/13 13:54:07 - Proccess Tables - Using run configuration [Pentaho local]
    2022/12/13 13:54:07 - proccess_tables - Starting entry [Log parameter]
    2022/12/13 13:54:07 - - customerId Param: 3
    2022/12/13 13:54:07 - proccess_tables - Starting entry [3 sec]
    2022/12/13 13:54:08 - start - Starting entry [ETL - Customer 2]
    2022/12/13 13:54:08 - ETL - Customer 2 - Using run configuration [Pentaho local]
    2022/12/13 13:54:08 - etl_customer - Starting entry [Log parameter]
    2022/12/13 13:54:08 - - customerId: 2
    2022/12/13 13:54:08 - etl_customer - Starting entry [Proccess Tables]
    2022/12/13 13:54:08 - Proccess Tables - Using run configuration [Pentaho local]
    2022/12/13 13:54:08 - proccess_tables - Starting entry [Log parameter]
    2022/12/13 13:54:08 - - customerId Param: 2
    2022/12/13 13:54:08 - proccess_tables - Starting entry [3 sec]
    2022/12/13 13:54:10 - proccess_tables - Finished job entry [3 sec] (result=[true])
    2022/12/13 13:54:10 - proccess_tables - Finished job entry [Log parameter] (result=[true])
    2022/12/13 13:54:10 - etl_customer - Starting entry [Success]



    ------------------------------
    Stephen Donovan
    Digital Solutions Architect
    Hitachi Vantara
    ------------------------------



  • 9.  RE: Run the same Job simultaneously with different parameters in Pentaho PDI

    Posted 12-13-2022 14:22

    The customer of Id 2 starts after 1 second of difference from the others. In the Job proccess_table I included a wait for of 3 seconds, to give time to hold the other processes and simulate the error in the log, which after custumer 2 starts the variables start to have the value 2, not respecting the parameter passed in the other Jobs in parallel.

    I'm using version 9.3.0.0-428

    2022/12/13 15:28:27 - RepositoriesMeta - Reading repositories XML file: C:\Users\steve\.kettle\repositories.xml
    2022/12/13 15:28:40 - RepositoriesMeta - Reading repositories XML file: C:\Users\steve\.kettle\repositories.xml
    2022/12/13 15:29:38 - RepositoriesMeta - Reading repositories XML file: C:\Users\steve\.kettle\repositories.xml
    2022/12/13 15:29:38 - Spoon - Connected to metastore : Teste, added to delegating metastore
    2022/12/13 15:29:38 - RepositoriesMeta - Reading repositories XML file: C:\Users\steve\.kettle\repositories.xml
    2022/12/13 15:30:11 - Spoon - Spoon
    2022/12/13 15:30:18 - Spoon - Spoon
    2022/12/13 15:34:50 - RepositoriesMeta - Reading repositories XML file: C:\Users\steve\.kettle\repositories.xml
    2022/12/13 15:34:50 - RepositoriesMeta - Reading repositories XML file: C:\Users\steve\.kettle\repositories.xml
    2022/12/13 15:34:50 - Spoon - Connected to metastore : ETL Geral, added to delegating metastore
    2022/12/13 15:34:50 - RepositoriesMeta - Reading repositories XML file: C:\Users\steve\.kettle\repositories.xml
    2022/12/13 16:13:26 - RepositoriesMeta - Reading repositories XML file: C:\Users\steve\.kettle\repositories.xml
    2022/12/13 16:13:27 - RepositoriesMeta - Reading repositories XML file: C:\Users\steve\.kettle\repositories.xml
    2022/12/13 16:13:27 - Spoon - Connected to metastore : Teste, added to delegating metastore
    2022/12/13 16:13:27 - RepositoriesMeta - Reading repositories XML file: C:\Users\steve\.kettle\repositories.xml
    2022/12/13 16:14:29 - Spoon - Iniciando o job...
    2022/12/13 16:14:29 - start - Início da execução do job
    2022/12/13 16:14:29 - start - Starting entry [ETL - Customer 1]
    2022/12/13 16:14:29 - start - Launched job entry [ETL - Customer 1] in parallel.
    2022/12/13 16:14:29 - start - Starting entry [ETL - Customer 3]
    2022/12/13 16:14:29 - start - Launched job entry [ETL - Customer 3] in parallel.
    2022/12/13 16:14:29 - start - Starting entry [ETL - Customer 4]
    2022/12/13 16:14:29 - start - Launched job entry [ETL - Customer 4] in parallel.
    2022/12/13 16:14:29 - start - Starting entry [ETL - Customer 5]
    2022/12/13 16:14:29 - start - Launched job entry [ETL - Customer 5] in parallel.
    2022/12/13 16:14:29 - start - Starting entry [1 sec]
    2022/12/13 16:14:29 - start - Launched job entry [1 sec] in parallel.
    2022/12/13 16:14:29 - ETL - Customer 1 - Using run configuration [Pentaho local]
    2022/12/13 16:14:29 - etl_customer - Starting entry [Log parameter]
    2022/12/13 16:14:29 - - customerId: 1
    2022/12/13 16:14:29 - etl_customer - Starting entry [Proccess Tables]
    2022/12/13 16:14:29 - Proccess Tables - Using run configuration [Pentaho local]
    2022/12/13 16:14:29 - proccess_tables - Starting entry [Log parameter]
    2022/12/13 16:14:29 - - customerId Param: 1
    2022/12/13 16:14:29 - proccess_tables - Starting entry [3 sec]
    2022/12/13 16:14:30 - ETL - Customer 3 - Using run configuration [Pentaho local]
    2022/12/13 16:14:30 - ETL - Customer 4 - Using run configuration [Pentaho local]
    2022/12/13 16:14:30 - ETL - Customer 5 - Using run configuration [Pentaho local]
    2022/12/13 16:14:30 - etl_customer - Starting entry [Log parameter]
    2022/12/13 16:14:30 - etl_customer - Starting entry [Log parameter]
    2022/12/13 16:14:30 - etl_customer - Starting entry [Log parameter]
    2022/12/13 16:14:30 - - customerId: 3
    2022/12/13 16:14:30 - etl_customer - Starting entry [Proccess Tables]
    2022/12/13 16:14:30 - Proccess Tables - Using run configuration [Pentaho local]
    2022/12/13 16:14:30 - - customerId: 5
    2022/12/13 16:14:30 - etl_customer - Starting entry [Proccess Tables]
    2022/12/13 16:14:30 - - customerId: 4
    2022/12/13 16:14:30 - etl_customer - Starting entry [Proccess Tables]
    2022/12/13 16:14:30 - Proccess Tables - Using run configuration [Pentaho local]
    2022/12/13 16:14:30 - Proccess Tables - Using run configuration [Pentaho local]
    2022/12/13 16:14:30 - proccess_tables - Starting entry [Log parameter]
    2022/12/13 16:14:30 - proccess_tables - Starting entry [Log parameter]
    2022/12/13 16:14:30 - - customerId Param: 4
    2022/12/13 16:14:30 - proccess_tables - Starting entry [3 sec]
    2022/12/13 16:14:30 - proccess_tables - Starting entry [Log parameter]
    2022/12/13 16:14:30 - - customerId Param: 4
    2022/12/13 16:14:30 - proccess_tables - Starting entry [3 sec]
    2022/12/13 16:14:30 - - customerId Param: 4
    2022/12/13 16:14:30 - proccess_tables - Starting entry [3 sec]
    2022/12/13 16:14:31 - start - Starting entry [ETL - Customer 2]
    2022/12/13 16:14:31 - ETL - Customer 2 - Using run configuration [Pentaho local]
    2022/12/13 16:14:31 - etl_customer - Starting entry [Log parameter]
    2022/12/13 16:14:31 - - customerId: 2
    2022/12/13 16:14:31 - etl_customer - Starting entry [Proccess Tables]
    2022/12/13 16:14:31 - Proccess Tables - Using run configuration [Pentaho local]
    2022/12/13 16:14:31 - proccess_tables - Starting entry [Log parameter]
    2022/12/13 16:14:31 - - customerId Param: 2
    2022/12/13 16:14:31 - proccess_tables - Starting entry [3 sec]
    2022/12/13 16:14:32 - proccess_tables - Starting entry [Get Configuration]
    2022/12/13 16:14:32 - Get Configuration - Using run configuration [Pentaho local]
    2022/12/13 16:14:32 - Get Configuration - Running transformation using the Kettle execution engine
    2022/12/13 16:14:32 - get_configuration_custumer - Expedindo in�cio para transforma��o [get_configuration_custumer]
    2022/12/13 16:14:32 - Get variables.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
    2022/12/13 16:14:32 - Set variables.0 - Setting environment variables...
    2022/12/13 16:14:32 - Set variables.0 - Set variable testNewVariable to value [2]
    2022/12/13 16:14:32 - Set variables.0 - Finished after 1 rows.
    2022/12/13 16:14:32 - Set variables.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
    2022/12/13 16:14:32 - Log Get Configuration.0 -
    2022/12/13 16:14:32 - Log Get Configuration.0 - ------------> Linenr 1------------------------------
    2022/12/13 16:14:32 - Log Get Configuration.0 - customerId = 2
    2022/12/13 16:14:32 - Log Get Configuration.0 -
    2022/12/13 16:14:32 - Log Get Configuration.0 - ====================
    2022/12/13 16:14:32 - Log Get Configuration.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
    2022/12/13 16:14:32 - procces