Pentaho

 View Only
  • 1.  Add Checksum output problem when changing Java or Pentaho version

    Posted 08-14-2022 06:18
    Hello, I have a problem with different MD5 hash output from Add Checksum. I use the md5 hash to find changes in data from source systems that I regularly load into my data warehouse. I am using old Pentaho version 6 and Java version 1.8.0_231, but I want to upgrade them to the newest version. After the update, just one of them, I get a lot of changed data rows (I have a very big DWH) because of md5 hash applied on the same data differ but values in rows did not change. What causes it and what is the best approach to do upgrade and do not load millions of "changed" rows?

    ------------------------------
    Adam Makara
    Systems Engineer
    DWH
    ------------------------------


  • 2.  RE: Add Checksum output problem when changing Java or Pentaho version

    Posted 08-14-2022 21:21

    Hi Adam

    I'd be very carefully checking that the data is coming through in exectaly the same way in the old install and the new.   If you are including floats in the data for the hash, then CPU/OS factors may vary slightly. You might try hashing a row after forcing them to definite values  and see if the difference still exists.



    ------------------------------
    Andrew Cave
    Systems Engineer
    BizCubed Pty Ltd
    Australia
    ------------------------------



  • 3.  RE: Add Checksum output problem when changing Java or Pentaho version

    Posted 08-15-2022 09:57

    I have exactly the same issue, but with SHA-256. It happens when using version 9.3, so I decided not to update and stay with 9.2 for now.

    Step: Add a checksum
    Type: SHA-256
    ResultType: Hexadecimal
    Field Separator: -
    Number of fields: 7

    Same as Adam, I have a table with about 6M records. My incoming data doesn't contain a UID, so I use the checksum to calculate one.



    ------------------------------
    Gert Wieland
    Application Services Manager
    UHN
    ------------------------------