Pentaho

 View Only

 Is there a way to prevent PDI from trying to connect to a DB connection when loading a transformation step?

  • Pentaho
  • Kettle
  • Pentaho
  • Pentaho Data Integration PDI
Luis Villegas's profile image
Luis Villegas posted 03-09-2021 23:04

EDIT: This is the same thing as this issue from 2010 that's marked as "fixed":

https://jira.pentaho.com/browse/PDI-3023

 

 

 

 

I'm on PDI 8.3, and I'm connecting to a remote repo.

 

When working on Transformations, I'm using DB connections that can't be accessed locally. This is fine, since I test the ETLs remotely on a server that does have access to the DBs.

 

The problem is that just making any simple changes in a step that uses a connection takes up to 8 minutes because, for example, just opening a Table Output step that is set up with a DB connection that can't be accessed locally will freeze for up to 2 minutes until the command line finally says "Unable to get fields from previous steps because of an error", which is fine for my purposes, I don't need to get the fields, and I don't need to try to connect to these DBs locally. If I click on the "Connections" dropdown, it will freeze again for 1-2 minutes. Once that works again, just selecting a new connection will freeze things for an additional 1-2 minutes. If I start trying to type a DB name or table name, it'll freeze after every letter, and clicking "ok" will also freeze for another minute.

 

This isn't confined to steps with connections, but also things that use previous fields that depend on field outputs...such as the "Select values" step, which doesn't have a connection field but freezes if any step before it has a connection.

 

The "Table Input" step doesn't have this problem.

 

I've been using a workaround for about a year, where I set up an SSH tunnel that actually does connected to those DBs, and I just connect to the SSH tunnel while developing the ETLs, and then I switch the connections at the very end, even though it takes 15+ minutes to go in and change connections due to all the freezes.

 

Right now I'm doing a task that is simply updating connection configs for multiple jobs, and instead of taking 30 minutes like it should've, it's been taking hours just waiting on these connections to fail.

 

Is there any way of preventing PDI from freezing or even trying to connect?


#Kettle
#Pentaho
#PentahoDataIntegrationPDI
Carlos Lopez's profile image
Carlos Lopez

Are there queries being recorded to the Database logs? Or is this just happening that we attempt to connect to the databases?

Luis Villegas's profile image
Luis Villegas

There are no queries in the logs. It happens when attempting to connect to the database, and it doesn't happen for every step, just some.

 

The most common ones are the Table Output steps and Dimension lookup/update steps. Example: When I open a Dimension Lookup step that points to one of these Databases that I know I can't access locally, it freezes up all of PDI and its UI for about 60-100 seconds, then the spoon log outputs:

 

org.pentaho.di.trans.steps.dimensionlookup.DimensionLookupMeta@7f4e5a39 - ERROR (version 8.3.0.0-371, build 8.3.0.0-371 from 2019-06-11 11.09.08 by buildguy) : Unable to get fields from previous steps because of an error

 

Once the UI is responsive again, I click on the "Connections" dropdown, and it freezes for another 60-100 seconds, and so on.

 

If I open a "Select values" step that has an input step upstream that uses a connection I know I can't locally access, it will also freeze the UI for 1-2 minutes, and eventually gives the same exact message as the Dimension lookup step above.

 

Output steps are the same. The amount of time it freezes the UI seems to depend on the amount of steps upstream that try to connect to databases that PDI can't currently connect to.

 

Strangely enough, Table Input steps don't have this problem, and neither to database merge steps.

 

It took me 1 hour to update all the DB connections in certain steps for 1 ETL transformation due to each connection change taking around 10 minutes due to the frozen UI. It's a task that would have taken less than 3 minutes without this issue, so this is a huge productivity drain right now.

Luis Villegas's profile image
Luis Villegas

Almost forgot...on some occasions, if I try clicking on the UI while it's loading/frozen, I'll get the following error in the log:

 

org.eclipse.swt.SWTException: Failed to execute runnable (org.eclipse.swt.SWTException: Widget is disposed)at org.eclipse.swt.SWT.error(Unknown Source)at org.eclipse.swt.SWT.error(Unknown Source)at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Unknown Source)at org.eclipse.swt.widgets.Display.runAsyncMessages(Unknown Source)at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1384)at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7949)at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:9331)at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:710)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.pentaho.commons.launcher.Launcher.main(Launcher.java:92)Caused by: org.eclipse.swt.SWTException: Widget is disposedat org.eclipse.swt.SWT.error(Unknown Source)at org.eclipse.swt.SWT.error(Unknown Source)at org.eclipse.swt.SWT.error(Unknown Source)at org.eclipse.swt.widgets.Widget.error(Unknown Source)at org.eclipse.swt.widgets.Widget.checkWidget(Unknown Source)at org.eclipse.swt.widgets.Text.getText(Unknown Source)at org.pentaho.di.ui.core.widget.TextVar.getText(TextVar.java:202)at org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.isConnectionSupported(TableOutputDialog.java:1077)at org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.lambda$validateSelection$0(TableOutputDialog.java:1071)at org.eclipse.swt.widgets.RunnableLock.run(Unknown Source)... 12 more