Gustavo Orair

Improve table input step performance on Oracle

Discussion created by Gustavo Orair on Jan 17, 2018

I have some ETLs and identified the Table input is the bottleneck causing degrading the overall performance.

Table input step speed is 60-80 rows/second on a row with just 4 number and 1 date (I suppose no more than 40 bytes/row).

 

The very low performance is probably caused by the high latency on this network. The table input is used to extract information from a remote database (other institution).

 

Executing the same query using another software (Toad) on the same client machine could read 500 rows/second.

I was figuring out how to configure the Oracle connections better on PDI to improve the table input step performance.

For example, using JDBC-OCI instead of pure JDBC and/or specifying greater fetch size to reduce the round-trips.

 

As I read on Oracle documentation, the default row-prefetch is 10:

* https://docs.oracle.com/cd/A97335_02/apps.102/a83724/oraperf2.htm

"If you do not set the default row-prefetch value for a connection, 10 is the default."

 

As suggested on PDI Wiki, we´ve configured the defaultRowPrefetch in Options tab:

* https://wiki.pentaho.com/display/EAI/Oracle

"Performance Considerations: Standard Fetch Size and Row Prefetching

In case you have a slow connection and/or your latency is high (e.g. via a VPN), prefetching data into the client reduces the number of round trips to the server and you can specify the number of rows to fetch with each trip.

Set the defaultRowPrefetch and fetchSize properties in the database connection dialog."

 

But this configuration didn´t change the performance.

I supposse the configuration was ignored because PDI has a Fixed Database fetchsize and oracle documentation inform to not mix fetch size and oracle prefetching API. If PDI has a 5000 fixed database fetchsize, the row-prefetch value was not used as the fetch size.

 

PDI Fixed Database fetchsize:

* http://jira.pentaho.com/browse/PDI-2458

* http://jira.pentaho.com/browse/PDI-5491

 

Oracle documentation informing to not mix fetch size and defaultRowPrefetch:

* https://docs.oracle.com/cd/A97335_02/apps.102/a83724/oraperf2.htm

"Do not mix the JDBC 2.0 fetch size API and the Oracle row-prefetching API in your application."

 

Row-pretech value is used as the default fetch size:

* https://docs.oracle.com/cd/E11882_01/java.112/e16548/resltset.htm#JJDBC28621:

"Standard JDBC also enables you to specify the number of rows fetched with each database round-trip for a query, and this number is referred to as the fetch size. In Oracle JDBC, the row-prefetch value is used as the default fetch size in a statement object. Setting the fetch size overrides the row-prefetch setting and affects subsequent queries run through that statement object."

 

We´ve also changed from JDBC to JDBC-OCI but the table input step performance didn´t changed.

 

 

Does anyone already configured the defaultRowPrefetch (EXTRA_OPTION_ORACLE.defaultRowPrefetch) succesfully?

 

Do you have any suggestion on this case?

Outcomes