Pentaho

 View Only

 Table Input to a batch REST call

  • Pentaho
  • Pentaho
  • Pentaho Data Integration PDI
Archive User's profile image
Archive User posted 09-16-2021 19:51

I need to save data by calling a REST API. On this REST API I can save data in batches, to gain performance. In one single call, I can save data from one thousand Table Input rows.

 

What I need is to read a thousand rows from a table input, process it by concatenating each row to a JSON string to, then, perform one call to the REST API. So, one REST API call to each 1000 rows.

 

As I have a limit of the amout of data I can send to thei REST API, I need to perform several calls. The amount of data is something like 50 thousand rows. So, my plain is to perform 50 calls with 1000 rows each. This is much more performatic than 50 thousand calls to the REST API.

 

Any idea how I can perform this "read, concatenate, call REST API" process with PDI?

Archive User's profile image
Archive User

After struggle a lot, I found a solution and, now, I can answer my own question. 

 

The best way to implement a solution to the question above is to implement a 'group by' step.

 

1. Create a sequence step starting in zero and increased by 1.

  

2. Create a caculator step:

 2.1. Set a constant 'group_size' with the number of rows needed to be grouped (1000 in this example). Set the type as Integer.

 2.2. Set a variable (I named 'batch_group') by dividing the sequence (field A) per 'group_size' (field B). Set the type as Integer.

 

 3. Create a 'group by' step:

 3.1. Set the group field as 'batch_group'.

 3.2. Set one aggregate with a name, like 'body_batch', informing the column to group in subject, and type 'Concatenate string separated by ,'.

  

With these steps, the output would be grouped by the value of 'batch_group', and all values will be concatenated in the column named in step 3.2, separated by comma.