Pentaho

 View Only

 PDICE 9.4, transformation Miscellaneous tab, Shared objects file field, doesn't support variable

Albert Wang's profile image
Albert Wang posted 01-18-2023 05:33
Dear Sir or Madam,

Our company used Pentaho Data Integration Community Edition (PDI-CE) 8.0 (Linux). We are upgrading to the PDI-CE 9.4. In the process, we met an issue.

To reproduce the issue, clone the project https://github.com/albertwangnz/reproduce-shared-objects-file-issue-pdice-9.4.

When we used PDI-CE 8.0, we used the feature Shared objects file in the transformation Miscellaneous tab. We defined database connections in different files to support different database engines. We set up a variable DB_CONN_SHARED_FILE in a previous transformation step. The value of the variable is the location of a shared object file (XML), like sharefiles/database-connections-mysql.xml. Then we used the variable as the value of the Shared objects file field as shown below. So, we could dynamically support various database engines, like MySQL, SQL Server, with the same *.ktr files.

Variable does not work

It used to work in PDI-CE 8.0. But after we upgraded to PDI-CE 9.4, it does not work anymore. If we hard-code the value in the Shared objects file field, as shown below, like sharefiles/database-connections-mysql.xml, it works. But the variable does not work anymore.

Hard-code value works

I wonder if anyone met the same problem and knows how to fix it.

Thank you.

Regards,
Albert

* The version of PDI-CE is 9.4.0.0-343
** The version of Linux is Ubuntu 18.04.6 LTS
Stephen Donovan's profile image
Stephen Donovan

“We define a variable in the Shared objects file field, and the value of the variable is set by a Set Variables step.”

 Is $KETTLE_SHARED_OBJECTS the variable the are setting? It sounds like they setting another variable and placing it this field?   What scope are they using when setting the variable?

If you can post an example job and transform that attempts to set this and simply prints it to the log we can look into it.

Stephen Donovan's profile image
Stephen Donovan
I may have confirmed what you are seeing.  This is my stripped down version with two shared.xml files connecting to different databases.  Can you confirm this represents the issue you are referring to?
Albert Wang's profile image
Albert Wang
Hi @Stephen Donovan

Thank you for your reply. I prepared an example project on Github.

https://github.com/albertwangnz/reproduce-shared-objects-file-issue-pdice-9.4

Also I can demo it to you with Google Meeting.

Please feel free to email me albert.wang@advantive.com

Thank you.

Regards,
Albert​
Albert Wang's profile image
Albert Wang
@Stephen Donovan

Hi Stephen,

Seems like I can reply here now.

I created a Github project as shown in the question with an example of the issue.

I also checked the file KETTLE_SHARED_OBJECTS.zip.

What I am trying to do is as shown below. I want to use a variable in the Shared objects file field. By doing this, I can set the value of the variable before Kettle starts the transformation, so I can dynamically decide which database to connect to.

Albert Wang's profile image
Albert Wang
@Stephen Donovan

I can demo the issue to you if you are ok to join a Google meeting or a Zoom meeting.

Thank you.

Albert​
Stephen Donovan's profile image
Stephen Donovan

It does seem to be a defect.  Worked in 8.0, but in 8.3 it does not which continues foward to 9.4.  There is a lot of caching in those shared objects, which I believe is the culprit.

I would recommend using kettle.properties (or other property file load steps) to set variables for database connections.  The connections defined in the shared.xml would use those varaibles.  In this way, the connection details can be cached and the variables that it uses changed at runtime.

Attachment  View in library
Albert Wang's profile image
Albert Wang
Thank you, @Stephen Donovan.

As I understand your workaround, I should use Variables in database connection fixed attributes, like <server​>, <type>, <access>, etc.

This is what I did. However, I need to support different database engines, like MySQL, SQL Server, etc. Therefore, I need to set up different database connection custom attributes. I cannot combine them into one single XML file.

For example, MySQL needs
  • EXTRA_OPTION_MYSQL.defaultFetchSize
  • EXTRA_OPTION_MYSQL.useCursorFetch
  • etc

SQL Server needs
  • MSSQLUseIntegratedSecurity
  • MSSQL_DOUBLE_DECIMAL_SEPARATOR
  • etc
In PDI-CE 8.0, I had two different Shared objects files. Those two files have the same database connection (the same name) but different attributes for MySQL and SQL Server separately. And I use a Variable in all ktr files Shared objects file field, so I can decide which database engine to connect to at runtime.

But it does not work now.

Do you think there will be a bug fix and a patch release sometime? Or there won't?

Thank you.

Regards,
Albert
Stephen Donovan's profile image
Stephen Donovan

For CE, there are no patches.  This issue could not even be considered until 9.6.   

Based on changes to metadata connections and variable context that I know are being considered, I am unsure if variables in transform metadata and toggling shared.xml would even be possible.  Toggling the dialect with a shared.xml file would definately make the caching of connections a larger issue.   I simply have not seen this techinque used in the field with enterprise customers, so it is difficult to assess.

I believe there is a format through the Generic JDBC connection that you could send these optional value in the connection URL.  Understanding that the format can get more complex.  By including them all in a variable like ${JDBC_URL} you could achieve the desired outcome.

https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-configuration-properties.html

https://learn.microsoft.com/en-us/sql/connect/jdbc/building-the-connection-url?view=sql-server-ver16