Pentaho

 View Only
  • 1.  Connection to repository is closed and job fails

    Posted 08-16-2022 11:32
    Hello,

    I'm running a job from command line (kitchen.sh) which starts transformation that reads a big XML file. It takes about 2 hours. When the transformation is done the job fails as connection to the repository database (postgres) is closed meanwhile so following transformations can't be read.

    Is there any way how to keep connection open?

    Following the error:

    ERROR executing query
    This connection has been closed.
    ERROR (version 9.2.0.0-290, build 9.2.0.0-290 from 2021-06-02 06.36.08 by buildguy) : org.pentaho.di.core.exception.KettleException:
    An error occured loading the directory tree from the repository

    ERROR executing query
    This connection has been closed.

    at org.pentaho.di.repository.kdr.delegates.KettleDatabaseRepositoryDirectoryDelegate.loadRepositoryDirectoryTree(KettleDatabaseRepositoryDirectoryDelegate.java:108)
    at org.pentaho.di.repository.kdr.KettleDatabaseRepository.loadRepositoryDirectoryTree(KettleDatabaseRepository.java:608)
    at org.pentaho.di.job.entries.trans.JobEntryTrans.getTransMetaFromRepository(JobEntryTrans.java:1302)
    at org.pentaho.di.job.entries.trans.JobEntryTrans.getTransMeta(JobEntryTrans.java:1341)
    at org.pentaho.di.job.entries.trans.JobEntryTrans.execute(JobEntryTrans.java:716)
    at org.pentaho.di.job.Job.execute(Job.java:693)

    Thank you
    Martin


  • 2.  RE: Connection to repository is closed and job fails

    Posted 08-18-2022 19:43
    From https://www.postgresql.org/docs/current/runtime-config-client.html

    idle_session_timeout (integer)

    Terminate any session that has been idle (that is, waiting for a client query), but not within an open transaction, for longer than the specified amount of time. If this value is specified without units, it is taken as milliseconds. A value of zero (the default) disables the timeout.

    Unlike the case with an open transaction, an idle session without a transaction imposes no large costs on the server, so there is less need to enable this timeout than idle_in_transaction_session_timeout.

    Be wary of enforcing this timeout on connections made through connection-pooling software or other middleware, as such a layer may not react well to unexpected connection closure. It may be helpful to enable this timeout only for interactive sessions, perhaps by applying it only to particular users.


    You should be able to set this connection option by editing the Connection and going to the Options section (in the left panel)

    Alternatively, you try using a job that calls a transform that parses the XML first, then passes the rows to the next transform which is where the postgres output is.   You can use the 'copy rows to result' and 'get rows from result' steps to pass the rows around.  that way the connection shouldn't happen until the rows arrive...hopefully

    Regards

    ------------------------------
    Andrew Cave
    Systems Engineer
    BizCubed Pty Ltd
    Australia
    ------------------------------



  • 3.  RE: Connection to repository is closed and job fails

    Posted 08-19-2022 07:31
    Thank you for your suggestions.
    So far I tried the first one, however, it didn't help.

    What I did:
    I updated the repository connection in the client application (both parameters, I even tried other parameters I found)

    Then I checked how the repositories.xml file was updated and copied the change to the server.
    So I changed /opt/etl/.kettle/repositories.xml file:
    <repositories>
    <connection>
    <name>TEST_REPOSITORY_CONNECTION</name>
    <server>SOME_IP_ADDRESS</server>
    <type>POSTGRESQL</type>
    <access>Native</access>
    <database>etl_repository</database>
    <port>5433</port>
    <username>mngmtadmin</username>
    <password>Encrypted 38457357...</password>
    <servername/>
    <data_tablespace/>
    <index_tablespace/>
    <attributes>
    <attribute><code>EXTRA_OPTION_POSTGRESQL.tcpKeepAlive</code><attribute>true</attribute></attribute>
    <attribute><code>EXTRA_OPTION_POSTGRESQL.tcp_keepalives_count</code><attribute>10</attribute></attribute>
    <attribute><code>EXTRA_OPTION_POSTGRESQL.tcp_keepalives_idle</code><attribute>30</attribute></attribute>
    <attribute><code>EXTRA_OPTION_POSTGRESQL.tcp_keepalives_interval</code><attribute>30</attribute></attribute>
    <attribute><code>EXTRA_OPTION_POSTGRESQL.idle_in_transaction_session_timeout</code><attribute>
    <attribute><code>EXTRA_OPTION_POSTGRESQL.idle_session_timeout</code><attribute></attribute>30000</attribute></attribute>30000</attribute>
    <attribute>
    <code>FORCE_IDENTIFIERS_TO_LOWERCASE</code>
    <attribute>N</attribute>
    </attribute>
    <attribute>
    <code>FORCE_IDENTIFIERS_TO_UPPERCASE</code>
    <attribute>N</attribute>
    </attribute>
    <attribute>
    <code>IS_CLUSTERED</code>
    <attribute>N</attribute>
    </attribute>
    <attribute>
    <code>PORT_NUMBER</code>
    <attribute>5433</attribute>
    </attribute>
    <attribute>
    <code>PRESERVE_RESERVED_WORD_CASE</code>
    <attribute>Y</attribute>
    </attribute>
    <attribute>
    <code>QUOTE_ALL_FIELDS</code>
    <attribute>N</attribute>
    </attribute>
    <attribute>
    <code>SUPPORTS_BOOLEAN_DATA_TYPE</code>
    <attribute>Y</attribute>
    </attribute>
    <attribute>
    <code>SUPPORTS_TIMESTAMP_DATA_TYPE</code>
    <attribute>Y</attribute>
    </attribute>
    <attribute>
    <code>USE_POOLING</code>
    <attribute>N</attribute>
    </attribute>
    </attributes>
    </connection>
    <repository>
    <id>KettleDatabaseRepository</id>
    <name>TEST</name>
    <description>TEST Repository</description>
    <is_default>false</is_default>
    <connection>TEST_REPOSITORY_CONNECTION</connection>
    </repository>
    </repositories>

    Then I started the job
    ./kitchen.sh -rep=TEST -user=job_executor -pass=***** -job=LoadXMLToStage_v2 -dir=/Stage -param=ts_file_id=205 1>>/opt/etl_data/log/LoadXMLToStage_v2_1_2022-08-19_10-10-41.txt 2>&1 &

    And the result is still the same:
    File is successfully loaded to the database, but the following transformations fail due to the closed connection to the repository.

    2022/08/19 12:14:47 - data_ep.0 - Finished processing (I=0, O=20000000, R=20000000, W=20000000, U=0, E=0)
    2022/08/19 12:14:47 - LoadXMLToStage_v2 - Starting entry [Log_011]
    2022/08/19 12:30:35 - Log_011 - ERROR (version 9.2.0.0-290, build 9.2.0.0-290 from 2021-06-02 06.36.08 by buildguy) : Unable to run job LoadXMLToStage_v2. The Log_011 has an error. An error occured loading the directory tree from the repository
    2022/08/19 12:30:35 - Log_011 -
    2022/08/19 12:30:35 - Log_011 - ERROR executing query
    2022/08/19 12:30:35 - Log_011 - An I/O error occurred while sending to the backend.
    2022/08/19 12:30:35 - Log_011 - ERROR (version 9.2.0.0-290, build 9.2.0.0-290 from 2021-06-02 06.36.08 by buildguy) : org.pentaho.di.core.exception.KettleException:
    2022/08/19 12:30:35 - Log_011 - An error occured loading the directory tree from the repository
    2022/08/19 12:30:35 - Log_011 -
    2022/08/19 12:30:35 - Log_011 - ERROR executing query
    2022/08/19 12:30:35 - Log_011 - An I/O error occurred while sending to the backend.
    2022/08/19 12:30:35 - Log_011 -
    2022/08/19 12:30:35 - Log_011 -
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.repository.kdr.delegates.KettleDatabaseRepositoryDirectoryDelegate.loadRepositoryDirectoryTree(KettleDatabaseRepositoryDirectoryDelegate.java:108)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.repository.kdr.KettleDatabaseRepository.loadRepositoryDirectoryTree(KettleDatabaseRepository.java:608)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.entries.trans.JobEntryTrans.getTransMetaFromRepository(JobEntryTrans.java:1302)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.entries.trans.JobEntryTrans.getTransMeta(JobEntryTrans.java:1341)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.entries.trans.JobEntryTrans.execute(JobEntryTrans.java:716)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.Job.execute(Job.java:693)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.Job.execute(Job.java:834)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.Job.execute(Job.java:834)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.Job.execute(Job.java:834)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.Job.execute(Job.java:834)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.Job.execute(Job.java:834)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.Job.execute(Job.java:834)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.Job.execute(Job.java:834)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.Job.execute(Job.java:834)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.Job.execute(Job.java:834)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.Job.execute(Job.java:834)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.Job.execute(Job.java:834)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.Job.execute(Job.java:834)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.Job.execute(Job.java:834)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.Job.execute(Job.java:834)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.Job.execute(Job.java:834)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.Job.execute(Job.java:834)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.Job.execute(Job.java:834)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.Job.execute(Job.java:834)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.Job.execute(Job.java:503)
    2022/08/19 12:30:35 - Log_011 - at org.pentaho.di.job.Job.run(Job.java:389)


    Martin




  • 4.  RE: Connection to repository is closed and job fails

    Posted 08-21-2022 19:12
    From the doco
    idle_session_timeout : If this value is specified without units, it is taken as milliseconds.

    So you've set the timeouts to 30 seconds.

    Valid units are "ms", "s", "min", "h", and "d"



    ------------------------------
    Andrew Cave
    Systems Engineer
    BizCubed Pty Ltd
    Australia
    ------------------------------



  • 5.  RE: Connection to repository is closed and job fails

    Posted 08-23-2022 05:06
    Hi,

    I'm sorry, my bad.
    However, the problem is still the same even after setting the correct value.

    ------------------------------
    Martin Blazek
    Consultant
    EBIDAT, s.r.o.
    ------------------------------



  • 6.  RE: Connection to repository is closed and job fails

    Posted 08-23-2022 19:14
    Hi Martin

    Create a new job, join a Transform executor entry to the Start entry.  Point that entry to a new transformation that contains your xml parser joined to a single 'copy rows to results' step.

    Back in the job, join another transform executor to the last one and point it to another new entry.  In that transformation add a 'get rows from results' step and join it to your database step.

    ------------------------------
    Andrew Cave
    Systems Engineer
    BizCubed Pty Ltd
    Australia
    ------------------------------