Pentaho

 View Only

 rest client speed (persistence required?)

  • Pentaho
  • Kettle
  • Pentaho
  • Pentaho Data Integration PDI
Nikola Garafolic's profile image
Nikola Garafolic posted 06-17-2019 10:09

I am using rest client to fetch data from api. Have to do thousands of requests, but using rest client results in poor performance (3-5 queries in a second) due to client not being persistent, but opening connection on every request. 

What can be done other than using multiple rest client in parallel? 

In rest client I use authorization and accept header. 


#Pentaho
#PentahoDataIntegrationPDI
#Kettle
David da Guia Carvalho's profile image
David da Guia Carvalho

You could play with a java class and make your own request... but not only would be a hardwork but also might make a mess with the transformation natural flow (lets say use loop inside the class)...

A base for http request could start with somethings like this:

import java.io.BufferedReader;

import java.io.IOException;import java.io.InputStreamReader;import java.net.HttpURLConnection;import java.net.MalformedURLException;import java.net.ProtocolException;import java.net.URL;

String url;

String sendData;

public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException

{

Object[] r = getRow();

HttpURLConnection con;

url = " http://localhost/";

String sendData = "TESTE"; 

// String url = get(Fields.In, "myurl").getString(r);

// get(Fields.In, "mydata").getString(r);

try {

URL myurl = new URL(url);

con = (HttpURLConnection) myurl.openConnection();con.setRequestMethod("GET");StringBuilder content;

BufferedReader in = new BufferedReader( new InputStreamReader(con.getInputStream()));

String line;content = new StringBuilder();content.append("OP GET FROM JAVA\n");content.append(sendData);

con.disconnect();

} catch(MalformedURLException ex){

throw new KettleException("PROBLEMS IN URL");} catch(java.io.IOException ex){ throw new KettleException("PROBLEMS IN DATA");}

return true;

}

Nikola Garafolic's profile image
Nikola Garafolic

Seems that remote endpoint is the cause for slow speeds I am getting - I think because I even tried to query using curl and speed was the same, around 5 requests per second on average.