Pentaho

 View Only

 Pentaho Kettle locking up while running TransformAnalyzer

Enrique del Pino's profile image
Enrique del Pino posted 07-07-2022 08:51

Hi All,

My name is Enrique del Pino and I am working in a project for The National Archives in the UK for which we have been building an extensive ETL process based on Pentaho 9.1 using Kettle as our data integration tool.

I have noticed that after running our workflows, the UI of pentaho remains a bit unresponsive for quite a while, even though, all transformations and subtransfromations in the workflow have finished. After investigating this, I have noticed there is in fact a thread created by Pentaho, which can stay running grabbing almost 100% of CPU usage for times ranging from 45 minutes to 1 hour in the scenarios we tested.

We investigated further this and found in the threaddump that this thread, in the scenario I'm attaching 32001, is executing some code related to TransformationAnalyzer and StepAnalyzer, which seems to be building some sort of tinkerpop graph inside. 

The knowledge in our team does not reach this far deep in Pentaho, so we'd appreciate a lot if someone could shed some light over the issue. We would like to understand what is this TransformationAnalyzer, why does it spend close to an hour performing this task and, ultimately, if there is something we can do in order to improve the performance of this.

Any help, ideas or thoughts are welcome!

Thanks,

Enrique

Thread 32001: (state = IN_JAVA) - java.util.HashMap$HashIterator.nextNode() @bci=95, line=1473 (Compiled frame; information may be imprecise) - java.util.HashMap$ValueIterator.next() @bci=1, line=1498 (Compiled frame) - java.util.AbstractCollection.toArray() @bci=39, line=141 (Compiled frame) - java.util.ArrayList.(java.util.Collection) @bci=5, line=178 (Compiled frame) - com.tinkerpop.blueprints.impls.tg.TinkerGraph.getVertices() @bci=13, line=279 (Compiled frame) - com.tinkerpop.blueprints.impls.tg.TinkerGraph.getVertices(java.lang.String, java.lang.Object) @bci=33, line=153 (Compiled frame) - org.pentaho.metaverse.api.model.BaseMetaverseBuilder.getVertexForNode(org.pentaho.metaverse.api.IMetaverseNode) @bci=51, line=324 (Compiled frame) - org.pentaho.metaverse.api.model.BaseMetaverseBuilder.addLink(org.pentaho.metaverse.api.IMetaverseLink) @bci=7, line=103 (Compiled frame) - org.pentaho.metaverse.api.model.BaseMetaverseBuilder.addLink(org.pentaho.metaverse.api.IMetaverseNode, java.lang.String, org.pentaho.metaverse.api.IMetaverseNode) @bci=33, line=472 (Compiled frame) - org.pentaho.metaverse.impl.DocumentController.addLink(org.pentaho.metaverse.api.IMetaverseNode, java.lang.String, org.pentaho.metaverse.api.IMetaverseNode) @bci=7, line=416 (Compiled frame) - org.pentaho.metaverse.api.analyzer.kettle.step.StepAnalyzer.processInputs(org.pentaho.di.trans.step.BaseStepMeta) @bci=234, line=433 (Compiled frame) - org.pentaho.metaverse.api.analyzer.kettle.step.StepAnalyzer.analyze(org.pentaho.metaverse.api.IComponentDescriptor, org.pentaho.di.trans.step.BaseStepMeta) @bci=193, line=149 (Compiled frame) - org.pentaho.metaverse.api.analyzer.kettle.step.StepAnalyzer.analyze(org.pentaho.metaverse.api.IComponentDescriptor, java.lang.Object) @bci=6, line=72 (Compiled frame) - org.pentaho.metaverse.analyzer.kettle.TransformationAnalyzer.analyze(org.pentaho.metaverse.api.IComponentDescriptor, org.pentaho.di.base.AbstractMeta, org.pentaho.metaverse.api.IMetaverseNode, java.lang.String) @bci=637, line=230 (Interpreted frame) - org.pentaho.metaverse.analyzer.kettle.TransformationAnalyzer.analyze(org.pentaho.metaverse.api.IComponentDescriptor, org.pentaho.metaverse.api.IDocument) @bci=205, line=124 (Interpreted frame) - org.pentaho.metaverse.analyzer.kettle.TransformationAnalyzer.analyze(org.pentaho.metaverse.api.IComponentDescriptor, java.lang.Object) @bci=6, line=69 (Interpreted frame) - org.pentaho.metaverse.util.MetaverseUtil$1.run() @bci=62, line=220 (Interpreted frame) - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 (Compiled frame) - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame) - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 (Compiled frame) - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame) - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1149 (Compiled frame) - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 (Interpreted frame) - java.lang.Thread.run() @bci=11, line=750 (Interpreted frame)
Andrew Cave's profile image
Andrew Cave
Hi Enrique

You've got the DataLineage plugin turned on.   You can view the doco at https://help.hitachivantara.com/Documentation/Pentaho/9.3/Products/Data_lineage and you can see how to turn it on.  And I presume turning it off is doing the opposite : )
Enrique del Pino's profile image
Enrique del Pino

Hi Andrew, thanks for your prompt answer!

What you have shown is quite interesting, as I was guessing from the stack trace some kind of graph db was being built. Unfortunately, I've checked the configuration pentaho on \system\karaf\etc\pentaho.metaverse.cfg  and I've got lineage.execution.runtime=off.

I've also tried to locate the generated file on my file system, as instructed in the config line lineage.execution.output.folder=./pentaho-lineage-output, with no success.

So it seems pentaho is still processing the data and building the graph, even with lineage disabled. Do you know if there might be another way to stop that functionality as we're likely not going to be making use of it?

Regards,

Enrique

Carlos Lopez's profile image
Carlos Lopez
There was a bug: https://jira.pentaho.com/browse/PDI-18970 where the root cause was the flag you both are referring to: lineage.execution.runtime=off was not being honored. So Spoon continued to execute the lineage analysis regardless of being on/off. This bug was fixed in 9.3 perhaps you want to give your ETL a try on this version.