Pentaho

 View Only

 PDI performance degradation

  • Pentaho
  • Kettle
  • Pentaho
  • Pentaho Data Integration PDI
Craig Shatswell's profile image
Craig Shatswell posted 06-24-2019 19:23

Hello, I am analyzing why our ETL process using PDI is gradually performing slower with each execution when put under a heavy load for hours.

This process has been in place for years and has migrated across multiple versions of PDI (Kettle).  I believe when first created it was on version 4.  This year, we upgrade from version 5.4 to 8.0.  After making this upgrade, we see performance move from fast to extremely slow over a few hours of constant processing.

Normally, this process takes 2-5 minutes to complete, however, after hours of back to back processing the performance diminishes to 25-35 minutes to complete one run.  On version 5.4, we did not see this type of performance issue, but there were other issues.

This process has hundreds of steps and we have not tried anything to improve performance by making changes to the ktr file.  We have ensured the environment is setup correctly and checked as many environment variables as we know to check. 

Here are some specifics about our environment:

Kettle 8.0.0.6.-352

Tomcat 8.5.32

Postgresql DB version 10 on both databases 

Postgresql JDBC driver 42.2.2

Java 8 (Oracle 1.8.0_181)

Currently, I have VisualVM connected to the server to watch performance.  I notice the heap gets larger over time with the garbage collection running but it seems to not clean up as much as it should.  

Our current work-around is to restart the tomcat server and this gets the performance back to the acceptable range.  However, this is not a desired or sustainable solution.

Has anyone else faced a similar issue?

If so, how did you resolve it?

Any other ideas for pinpointing the issue?

Were there changes from version 5.4 to 8.0 that would cause this type of performance degradation?

Any help is greatly appreciated.

Craig


#Pentaho
#PentahoDataIntegrationPDI
#Kettle
Dean Flinter's profile image
Dean Flinter

Unfortunately I can't help specifically but going from 5.4 to 8.0 would also require a change in Java version (version 7 I believe?)

If there was no specific change to PDI that causes this for you, perhaps it is an issue with Java itself

Craig Shatswell's profile image
Craig Shatswell

Actually, the requirements state it is Java 8 that is required for PDI 8.0.  We already made this upgrade from Java 7 to Java 8.

David da Guia Carvalho's profile image
David da Guia Carvalho

Hi,

When I update i had some problems with data types, convertions and lenience. Any way, I suggest that you trace your execution to see what steps are showing the degradation, and your hardware consumption (memory, HD, io, cpu, swap, network).

 

Craig Shatswell's profile image
Craig Shatswell

I have not tried this yet.  Thanks for the suggestions.  

I find in the heap dumps that java.lang.ref.Finalizer is not getting cleared out of the heap. heapdump6-20

Any ideas about this thread leak?

David da Guia Carvalho's profile image
David da Guia Carvalho

You could check your "step metrics" to find in witch point exactly is your bottleneck. I suggest, if possible to do it using spoon during the execution! That should give you some hints on the point(s) of degradation. Also following with iotop and other monitoring tools...

Steven Brown's profile image
Steven Brown

Hi Craig,

If you're running hours of back-to-back processes and notice that the JVM memory isn't being freed as expected during GC time, you might try these adding these JVM options in the start-pentaho.{sh|bat}, OPT environment variable:

-XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+ExplicitGCInvokesConcurrent -XX:+AggressiveOpts

Good luck,

Steven

Herve Naga's profile image
Herve Naga

Hello,

I'am newbie but i allow myself to write you because i had met some problems of performance on PDI 8.0 and i'am update to PDI 8.2 (in the last version) who is better .Attention i don't have thousands steps as you. Also you might be that you update your spoon.bat or spoon.sh as Graig has sayed. However i will, change the following syntax (Adjusted the values) 

"%PENTAHO_DI_JAVA_OPTIONS%"=="" set PENTAHO_DI_JAVA_OPTIONS="-Xms1024m" "-Xmx2048m" "-XX:MaxPermSize=256m"

May be a mixed line with that of Graig.

Thanks