Pentaho

 View Only

 How to get directories in a Pentaho repository from Linux command line tool?

  • Pentaho
  • Kettle
  • Pentaho
  • Pentaho Data Integration PDI
Alfredo Burgos's profile image
Alfredo Burgos posted 08-30-2019 08:46

I’m unable to get the directories created in a Pentaho repository by using a linux command line tool. As a consequence, I’m unable to run the transformations and/or jobs located in that repository.

 

These are the tasks I’ve done until now:

  1. On a windows machine I’ve installed a PDI client (pdi-ce-8.2.0.0-342 windows version). From that installation I can launch the Spoon GUI where I can design and run both jobs and transformations perfectly well.
  2. Then, on a Linux machine, I’ve set up a Pentaho repository I can connect to from the Spoon GUI and run both jobs and transformations stored in that repository.
  3. Moreover, I can successfully invoke these jobs and transformations stored in that repository from a windows command line tool.
  4. Then, on the some Linux machine where the pentaho repository lies, I decided to install a PDI client (pdi-ce-8.2.0.0-342 linux version) to check if I could invoke the jobs and transformations stored on the repository from a linux command line tool.

 

Once installed the Linux based pdi-ce client, and following the instructions here described, I’m attempting to discover the Pentaho repository to check if I can invoke the jobs and transformations there stored. The command like the following one is running successfully:

 

$ pan.sh –listrep

 

#######################################################WARNING: no libwebkitgtk-1.0 detected, some features will be unavailable Consider installing the package with apt-get or yum. e.g. 'sudo apt-get install libwebkitgtk-1.0-0'#######################################################OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0log4j:WARN Continuable parsing error 45 and column 76log4j:WARN Element type "rollingPolicy" must be declared.log4j:WARN Continuable parsing error 52 and column 14log4j:WARN The content of element type "appender" must match "(errorHandler?,param*,layout?,filter*,appender-ref*)".log4j:WARN Please set a rolling policy for the RollingFileAppender named 'pdi-execution-appender'16:02:23,183 INFO [KarafBoot] Checking to see if org.pentaho.clean.karaf.cache is enabled16:02:23,339 INFO [KarafInstance] ********************************************************************** Karaf Instance Number: 2 at /home/usu/telemed/data-integration/./system *** /karaf/caches/pan/data-1 *** FastBin Provider Port:52902 *** Karaf Port:8803 *** OSGI Service Port:9052 *******************************************************************ago 08, 2019 4:02:24 PM org.apache.karaf.main.Main$KarafLockCallback lockAquired[omitted]2019/08/08 16:02:37 - Pan - Start of run.2019/08/08 16:02:37 - RepositoriesMeta - Reading repositories XML file: /home/usu/telemed/.kettle/repositories.xml#1 : myRepository [PentahoRepository@https://pentaho.uites.isciii.es][omitted]

That is, the name of the repository (myRepository) is being discovered, although I’d like to clarify that to achieve this I had to manually copy the file %USER_HOME%\.kettle\ repositories.xml (on the windows machine) to the path $USER_HOME/.kettle directory (on the linux machine), that is something that it’s not mentioned in the guide I was following and it’s a step I’m not actually completely sure I should've taken.

The problems arise when I attempt to retrieve the names of the directories created within the repository, that is, at the time of running the following comand I get the following errors:

 

$ kitchen.sh -rep:myRepository –listdir

 

#######################################################WARNING: no libwebkitgtk-1.0 detected, some features will be unavailable Consider installing the package with apt-get or yum. e.g. 'sudo apt-get install libwebkitgtk-1.0-0'#######################################################OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0log4j:WARN Continuable parsing error 45 and column 76log4j:WARN Element type "rollingPolicy" must be declared.log4j:WARN Continuable parsing error 52 and column 14log4j:WARN The content of element type "appender" must match "(errorHandler?,param*,layout?,filter*,appender-ref*)".log4j:WARN Please set a rolling policy for the RollingFileAppender named 'pdi-execution-appender'16:19:47,362 INFO [KarafBoot] Checking to see if org.pentaho.clean.karaf.cache is enabled16:19:47,531 INFO [KarafInstance] ******************************************************************* Karaf Instance Number: 2 at /home/usu/telemed/data-integration/./system *** /karaf/caches/kitchen/data-1 *** FastBin Provider Port:52902 *** Karaf Port:8803 *** OSGI Service Port:9052 *****************************************************************ago 08, 2019 4:19:48 PM org.apache.karaf.main.Main$KarafLockCallback lockAquiredINFO: Lock acquired. Setting startlevel to 1002019/08/08 16:19:49 - Kitchen - Start of run.2019/08/08 16:19:49 - RepositoriesMeta - Reading repositories XML file: /home/usu/telemed/.kettle/repositories.xml2019/08/08 16:19:49 - PurRepositoryConnector - Creating security provider2019/08/08 16:19:49 - PurRepositoryConnector - Creating repository sync web service2019/08/08 16:19:49 - PurRepositoryConnector - Creating repository web service2019/08/08 16:19:49 - PurRepositoryConnector - Creating session sync web serviceago 08, 2019 4:19:52 PM com.sun.xml.ws.api.streaming.XMLStreamReaderFactory$Woodstox <init>WARNING: Expected property not found in Woodstox input factory: {0}2019/08/08 16:19:52 - PurRepositoryConnector - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : Failure access to WSDL at: https://pentaho.uites.isciii.es/pentaho/webservices/repositorySync?wsdl. Ha fallado con: 2019/08/08 16:19:52 - PurRepositoryConnector - Connection refused.2019/08/08 16:19:52 - PurRepositoryConnector - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : javax.xml.ws.WebServiceException: Failure access to WSDL at: https://pentaho.uites.isciii.es/pentaho/webservices/repositorySync?wsdl. Ha fallado con: 2019/08/08 16:19:52 - PurRepositoryConnector - Connection refused.2019/08/08 16:19:52 - PurRepositoryConnector - at …[omitted]

To sum up, I don’t know why I cannot get the list of the pentaho directories. I don’t know if the steps I’ve made are consistent or not either. That’s why I’d really appreciate if some Pentaho DI expert could clarify if what I’ve done is right or not, as well as how I can do to sort this problem out.


#Kettle
#PentahoDataIntegrationPDI
#Pentaho
Alfredo Burgos's profile image
Alfredo Burgos

Not anyone with experience in discovering PDI repositories from a Linux command line tool?

Alfredo Burgos's profile image
Alfredo Burgos

My post is about the problems I'm having when interacting with a Pentaho repository. As I said, I’m unable to run the transformations and/or jobs located in that repository. And you are suggesting to me to run a transformation... Great!

In fact, my problems come earlier than trying to run transformations. As I said, I'm unable to read the directories of a repository. And If I cannot read the directories where the tranformations are stored, how do you expect to invoke tranformations?

David da Guia Carvalho's profile image
David da Guia Carvalho

Im sorry, I miss read it... Iand assume that you was dealing with some wierd problem with the "repository list" and permissions...

Alfredo Burgos's profile image
Alfredo Burgos

But what sort of problem could it be? As I said, it all works fine when the commands are run on windows based enviroments. The problems only arise on Linux based enviroments. I don't know if the fact that the client and the server are installed on the same machine has to do with it.

David da Guia Carvalho's profile image
David da Guia Carvalho

Ok, lets slice the problem...

Some times "listdir" dont work even when you can connect and execute job/trans.

 

1 - Can you reach the rep server from linux pdi "https://pentaho.uites.isciii.es" (telnet pentaho.uites.isciii.es 443 / openssl  -s_client...)

2 - Do your java trust the cert from pentaho.uites.isciii.es (cacert)?

3 - can you make a simple (ktr) to connect to the "https://pentaho.uites.isciii.es" and comsume the API?

4 - Can you execute a ktr/kjb on the linux (exported from your windows)?

5 - Can you execute a transformation at the "root" (/) of your repository?

(I know that you said that can't but post it pls)

 

Alfredo Burgos's profile image
Alfredo Burgos

First of all, I'd like to thank you the time you're investing in trying to help me out.

 

Regarding the "listdir" command, I'd like to clarify that it only works on Windows environments. On Linux environments it's never worked.

 

(1) I can reach Pentaho server through both https and telnet protocols:

https://pentaho.uites.isciii.es (browser on Linux and Windows systems)

telnet pentaho.uites.isciii.es 443 (Linux command-line tool)

 

(2) I don't know how to know if my java trust the cert from my Pentaho server. Could you instruct to me how to find it out?

 

(3) Could you instruct to me how to do what you're suggesting? I think it has to be done within Pentaho-Spoon but... what step should I use? (I'm really sorry but I'm a newcomer to the Pentaho-Kettle world).

 

(4) I can confirm that I'm able to run ktr/kjb on Linux environment which had previously been exported from Windows environment. The command I've used is the following:

pan.sh -norep -file <myktrfile>

 

(5) As far as I know (it may be I'm wrong), Pentaho repositories are sandboxed, that is, you don't know where your repository is located. As a consequence, you just cannot invoke ktr/kjb located at the root of the repository.

 

Looking forward to reading a reply from you.

Thanks.

 

David da Guia Carvalho's profile image
David da Guia Carvalho

2) Its common task having to add certificates to cacerts on https usage, so you have to find where is the cacerts been used by your java install (it might have more then one) and add the certificate to it. To accomplish that, you can use "keytools" to check the cacerts and add the cert to it if needed.

https://knowledge.digicert.com/solution/SO4085.html

https://plone.lucidsolutions.co.nz/linux/java/how-to-add-a-certificate-authority-ca-certificate-to-the-openjdk-cacerts

https://www.sslshopper.com/article-most-common-java-keytool-keystore-commands.html

 

3) You can use a simple rest step(Or any other http), if your spoon,pan,kitchen are able to reach the server with it... it should be able to connect to your repo.

 

5) Can be any job/trans on any dir... just need to be one that you are sure where it is... so... somethings in the line:

 ./kitchen.sh -rep=myRep -user=MyUseradmin -pass=MyPass -dir=/home/MyUseradmin -job=myJob_test

 

Alfredo Burgos's profile image
Alfredo Burgos

In an attempt to simplify the problem I've decided to remove the security in the server. Then, I've changed the content of the $(USER_HOME)/.kettle/repositories.xml so that the <repository_location_url> element takes value of http://pentaho.uites.isciii.es/pentaho. After restarting the Pentaho server and launching the followind command, I get the following error:

 

$ kitchen.sh -rep:myRepository -user:admin -pass:password -listdir

 

PurRepositoryConnector - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : Fallo al acceder al WSDL en: http://pentaho.uites.isciii.es/pentaho/webservices/unifiedRepository?wsdl. Ha fallado con: PurRepositoryConnector - Conexión rehusada (Connection refused).Processing stopped because of an error: java.lang.Exception: Failed to connect to a Pentaho Server Instance. Please check your server connection information and make sure your server is running.Failed to connect to a Pentaho Server Instance. Please check your server connection information and make sure your server is running.[PurRepositorySecurityProvider] Unable to initialize User Role list webservicejavax.xml.ws.WebServiceException: Fallo al acceder al WSDL en: http://pentaho.uites.isciii.es/pentaho/webservices/userRoleListService?wsdl. Ha fallado con: Conexión rehusada (Connection refused). at com.sun.xml.ws.wsdl.parser.RuntimeWSDLParser.tryWithMex(RuntimeWSDLParser.java:265)at com.sun.xml.ws.wsdl.parser.RuntimeWSDLParser.parse(RuntimeWSDLParser.java:246)...at com.sun.xml.ws.client.WSServiceDelegate.parseWSDL(WSServiceDelegate.java:363)...Caused by: java.net.ConnectException: Conexión rehusada (Connection refused)at java.net.PlainSocketImpl.socketConnect(Native Method)at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)at java.net.Socket.connect(Socket.java:589)at java.net.Socket.connect(Socket.java:538)at sun.net.NetworkClient.doConnect(NetworkClient.java:180)at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)at sun.net.www.http.HttpClient.New(HttpClient.java:339)at sun.net.www.http.HttpClient.New(HttpClient.java:357)at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1226)at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990)at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1570)at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498)at java.net.URL.openStream(URL.java:1057)at com.sun.xml.ws.wsdl.parser.RuntimeWSDLParser.createReader(RuntimeWSDLParser.java:999)at com.sun.xml.ws.wsdl.parser.RuntimeWSDLParser.resolveWSDL(RuntimeWSDLParser.java:400)at com.sun.xml.ws.wsdl.parser.RuntimeWSDLParser.parse(RuntimeWSDLParser.java:231)... 15 more

As I'm able to connect to the pentaho server through http and telnet protocols (http://pentaho.uites.isciii.es using a browser, and typing the command 'telnet pentaho.uites.isciii.es 8080), I don't still know why I'm not able to get the list of directories set in the Pentaho repository.

 

Any idea/suggestion? Thanks.

David da Guia Carvalho's profile image
David da Guia Carvalho

Port problem?

You state : "using a browser, and typing the command 'telnet pentaho.uites.isciii.es 8080"

But the connection is trying: "http://pentaho.uites.isciii.es/" (80)

Your repository conf. might be missing the 8080?

 

Alfredo Burgos's profile image
Alfredo Burgos

All these commands run ok:

  • telnet pentaho.uites.isciii.es 80
  • telnet pentaho.uites.isciii.es 8080
  • http://pentaho.uites.isciii.es

 

Here it is the content of the $(USER_HOME)/.kettle/repositories.xml file:

 

<?xml version="1.0" encoding="UTF-8"?>

<repositories>

 <repository>

    <id>PentahoEnterpriseRepository</id>

  <name>myRepository</name>

  <description>PentahoRepository@http://pentaho.uites.isciii.es</description>

  <is_default>true</is_default>

  <repository_location_url>http://pentaho.uites.isciii.es/pentaho</repository_location_url>

  <version_comment_mandatory>N</version_comment_mandatory>

 </repository>

</repositories>

 

As it can be seen, there is no trace about port configuration whatsoever.

David da Guia Carvalho's profile image
David da Guia Carvalho
Alfredo Burgos's profile image
Alfredo Burgos

Great!! Got it to work!! Really appreciate your help and the time invested.

 

I don't achieve to understand why the former configuration of the repositories.xml file was valid to discover the repositories created on the server but it wasn't valid to retrieve the directories created on that repo.

Now, with the configuration you suggested, I achieve to retrieve the directories structure on the repository, the list of jobs and/or transformations there stored and, what is more, I'm able to execute these transformations remotely from my Linux command line tool, what is exactly what I wanted.

 

Thank you very much indeed again!

Henkel Jonas's profile image
Henkel Jonas

Which is named in a way to make the version clear UPSers.