Pentaho

 View Only

 Pentaho 9.3: Truncate option in Table Output doesn't reset the Auto Increment Field

Jump to Best Answer
Prateek Udaipure's profile image
Prateek Udaipure posted 02-22-2023 05:42

Hi Community Members,

I am relatively new to Pentaho and our team is planning to upgrade from Pentaho 6.0 to 9.3 version.

The issue I am facing is related to Truncate Option in Table Output with "Return Auto Generated Key" checked.
An example of my issues is listed as below:
I have a data grid with a string data for "status" with 4 different static values(e.g. closed, open, pending, In-progress).
I am writing this data to a table column in MySQL database. Table name "sample" and column name "status" with "Truncate Table" and "Return Auto generated key (Name: pk_status)" options checked.

On MySQL database table side, I have 2 columns in the table "sample". 
1. pk_status - Auto Increment column
2. status - to store static values.

When I run the Pentaho job first time, I get the output as expected.

pk_status    status
1                   closed
2                  In-progress
3                  open
4                  pending

Though, when I run the job again I get below output. 

pk_status    status
5                  closed
6                  In-progress
7                  open
8                  pending

Here the auto increment column values have been incremented rather than getting reset to 1 after the truncate.
Upon checking the logs in Heidi sql with Pentaho 9.3 run (unable to do so for Pentaho 6.0 version), I observed "DELETE FROM sample" is getting executed even though truncate option has been checked.

With Pentaho 6.0 outputs have always been as expected and increment gets reset to 1. The issue is only with Pentaho version 9.3.

Appreciate your response if someone has seen this behavior. Is this by design or a bug?

Stephen Donovan's profile image
Stephen Donovan Best Answer

I don't have an answer, but I think I have a workaround.  In your connection, change from the MySQL dialect to the Generic Database dialect.  Specifiy the URL and driver class (likely com.mysql.cj.jdbc.Driver).  Then select MySQL in the dialect drop down at the top.  If you forget this last step, you will get DELETE statements from the Generic Dialect.  
Use Generic connection and specify the dialect.

https://github.com/pentaho/pentaho-kettle/blob/9.4/core/src/main/java/org/pentaho/di/core/database/GenericDatabaseMeta.java#L116-L121

I am not sure why this works and using the MySQL dialect directly does not.  More reseach needed.

Stephen Donovan's profile image
Stephen Donovan

Not an answer yet. 

I have confirmed this behavior in 7.1, 8.3.0.10 and 9.4.  With the current Mysql 8 Driver and the older version 5 org.gjt.mm.mysql.Driver.

Stephen Donovan's profile image
Stephen Donovan

Duplicate details removed.

Stephen Donovan's profile image
Stephen Donovan

My initial tests were invalid.  I had the select in the same Transform as the TableOutput.  This creates a potential shared connection group which forced the DELETE due to parallel thread use.
https://github.com/pentaho/pentaho-kettle/blob/8.3/core/src/main/java/org/pentaho/di/core/database/Database.java#L3325-L3336

I cleaned it up to a simple data grid and insert with truncate.  It now works in 7.1 and 8.3.0.10.   It looks like this was the result of a defect fix in 9.1 fixed in 9.3.   It also explains why the Generic Database is able to override it.    https://pentaho-public.atlassian.net/browse/BISERVER-14546

I will file a defect for your issue with the workaround.

Please confirm if the workaround was successful.

Prateek Udaipure's profile image
Prateek Udaipure

Thank you @Stephen Donovan , the workaround works well with truncating a table.

However, it causes an issue with "Select Step".
For example, if we have a table input with a SQL query:
SELECT
A,
B,
C,
D
from tableName;
If we preview the result in we will get "Field2","Field3","Field4","Field5" instead of "A","B","C""D" as output fields with Dialect as "MySQL". With "Generic" Dialect the result is as expected with correct field names.

With having too many transformations & jobs, it becomes difficult to rename the field names manually.
Please let me know if you have similar observations.

Stephen Donovan's profile image
Stephen Donovan

I am not seeing that.  I am on MySQL 8.0.30 with the same driver.  Make sure you don't have a driver mismatch.

If you are using shared DB connections and consitent naming, the toggle of the driver and even the Type from MySQL to Generic and back should be transparent.

Stephen Donovan's profile image
Stephen Donovan

Note the change being reverted was here.
https://github.com/pentaho/pentaho-kettle/blob/9.4.0.0/core/src/main/java/org/pentaho/di/core/database/Database.java#L3303

I am not CE user or able to help you compile a fix that that code in 9.4.  But there may be members that can get you down that road if desired.