Hi All,
My name is Enrique del Pino and I am working in a project for The National Archives in the UK for which we have been building an extensive ETL process based on Pentaho 9.1 using Kettle as our data integration tool.
I have noticed that after running our workflows, the UI of pentaho remains a bit unresponsive for quite a while, even though, all transformations and subtransfromations in the workflow have finished. After investigating this, I have noticed there is in fact a thread created by Pentaho, which can stay running grabbing almost 100% of CPU usage for times ranging from 45 minutes to 1 hour in the scenarios we tested.
We investigated further this and found in the threaddump that this thread, in the scenario I'm attaching 32001, is executing some code related to TransformationAnalyzer and StepAnalyzer, which seems to be building some sort of tinkerpop graph inside.
The knowledge in our team does not reach this far deep in Pentaho, so we'd appreciate a lot if someone could shed some light over the issue. We would like to understand what is this TransformationAnalyzer, why does it spend close to an hour performing this task and, ultimately, if there is something we can do in order to improve the performance of this.
Any help, ideas or thoughts are welcome!
Thanks,
Enrique
Thread 32001: (state = IN_JAVA) - java.util.HashMap$HashIterator.nextNode() @bci=95, line=1473 (Compiled frame; information may be imprecise) - java.util.HashMap$ValueIterator.next() @bci=1, line=1498 (Compiled frame) - java.util.AbstractCollection.toArray() @bci=39, line=141 (Compiled frame) - java.util.ArrayList.(java.util.Collection) @bci=5, line=178 (Compiled frame) - com.tinkerpop.blueprints.impls.tg.TinkerGraph.getVertices() @bci=13, line=279 (Compiled frame) - com.tinkerpop.blueprints.impls.tg.TinkerGraph.getVertices(java.lang.String, java.lang.Object) @bci=33, line=153 (Compiled frame) - org.pentaho.metaverse.api.model.BaseMetaverseBuilder.getVertexForNode(org.pentaho.metaverse.api.IMetaverseNode) @bci=51, line=324 (Compiled frame) - org.pentaho.metaverse.api.model.BaseMetaverseBuilder.addLink(org.pentaho.metaverse.api.IMetaverseLink) @bci=7, line=103 (Compiled frame) - org.pentaho.metaverse.api.model.BaseMetaverseBuilder.addLink(org.pentaho.metaverse.api.IMetaverseNode, java.lang.String, org.pentaho.metaverse.api.IMetaverseNode) @bci=33, line=472 (Compiled frame) - org.pentaho.metaverse.impl.DocumentController.addLink(org.pentaho.metaverse.api.IMetaverseNode, java.lang.String, org.pentaho.metaverse.api.IMetaverseNode) @bci=7, line=416 (Compiled frame) - org.pentaho.metaverse.api.analyzer.kettle.step.StepAnalyzer.processInputs(org.pentaho.di.trans.step.BaseStepMeta) @bci=234, line=433 (Compiled frame) - org.pentaho.metaverse.api.analyzer.kettle.step.StepAnalyzer.analyze(org.pentaho.metaverse.api.IComponentDescriptor, org.pentaho.di.trans.step.BaseStepMeta) @bci=193, line=149 (Compiled frame) - org.pentaho.metaverse.api.analyzer.kettle.step.StepAnalyzer.analyze(org.pentaho.metaverse.api.IComponentDescriptor, java.lang.Object) @bci=6, line=72 (Compiled frame) - org.pentaho.metaverse.analyzer.kettle.TransformationAnalyzer.analyze(org.pentaho.metaverse.api.IComponentDescriptor, org.pentaho.di.base.AbstractMeta, org.pentaho.metaverse.api.IMetaverseNode, java.lang.String) @bci=637, line=230 (Interpreted frame) - org.pentaho.metaverse.analyzer.kettle.TransformationAnalyzer.analyze(org.pentaho.metaverse.api.IComponentDescriptor, org.pentaho.metaverse.api.IDocument) @bci=205, line=124 (Interpreted frame) - org.pentaho.metaverse.analyzer.kettle.TransformationAnalyzer.analyze(org.pentaho.metaverse.api.IComponentDescriptor, java.lang.Object) @bci=6, line=69 (Interpreted frame) - org.pentaho.metaverse.util.MetaverseUtil$1.run() @bci=62, line=220 (Interpreted frame) - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 (Compiled frame) - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame) - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 (Compiled frame) - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame) - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1149 (Compiled frame) - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 (Interpreted frame) - java.lang.Thread.run() @bci=11, line=750 (Interpreted frame)