Pentaho

 View Only

 Kettle Jobs Won't Run Transformations

  • Pentaho
  • Kettle
  • Pentaho
  • Pentaho Data Integration PDI
Luis Villegas's profile image
Luis Villegas posted 08-22-2019 02:40

Ok, I've been ripping my hair out with this, and can't find anything anywhere.

 

I've got a Pentaho Server (8.3 CE) running on an AWS EC2 server. I have PDI 8.3 on my local computer. I set up a Pentaho Repository on the Pentaho Server.

 

I've created a bunch of transformations, and they all run without issue. However, if I put a transformation in a job, the job doesn't run the transformation, but says it finished successfully.

 

I've tried this with many different transformations, and none work. The job does run every other step, including running other jobs (runs other jobs that don't deal with transformations)!

 

I've tried creating a new job from scratch, a new transformation from scratch that just outputs a simple text file. In the job, it's literally just "Start -> Transformation -> Dummy". The transformation runs fine by itself, but when the job runs and finishes successfully, the transformation doesn't run!

 

Looking at the logs, I notice it says:

 

Transformation - Opening transformation: [null] in directory

No matter what I do, it always says [null] for the transformation. I've tried the following:

  • Try different transformations
  • Create transformations in different parts of the pentaho repository
  • Create new jobs
  • Run locally (this works on my local computer, but not the server)
  • Created new run configuration (Engine: Pentaho, Settings: Pentaho Server). Doesn't work
  • I changed the transformation Run Configuration, still doesn't work
  • I've tried all the Transformation step execution options and the different combinations, nothing
  • Tried using ${Internal.Entry.Current.Directory} for a relative path to the transformation, still same behavior
  • Tried absolute repo path, and absolute filesystem path, none worked (error from filesystem path, obviously, since it's on a pentaho repo)
  • I've tried running the job directly from the Pentaho BI server dashboard by double clicking the job file. Still same log output
  • I tried using PDI (spoon) on a different computer/OS with a clean install, and just connecting to the repo. Still a no-go

 

I've been stuck on this seemingly simple thing for days, and I don't know what to even try next at this point. The only other place I've seen this is here:

 

https://stackoverflow.com/questions/53411939/pentaho-di-opening-transformation-null-in-directory

 

And there's no answer....


#Kettle
#Pentaho
#PentahoDataIntegrationPDI
Paulo Pires's profile image
Paulo Pires

Hi Luis,

 

Try using ${Internal.Job.Repository.Directory} instead of ${Internal.Entry.Current.Directory} but not sure if it will work.

 

Best regards

Luis Villegas's profile image
Luis Villegas

Just tried it, still didn't work. The log still says:

 

2019/08/22 15:37:32 - Transformation - Starting job entry

2019/08/22 15:37:32 - Transformation - Opening transformation: [null] in directory [/public/transformations]

2019/08/22 15:37:33 - Transformation - Starting transformation...(file=null, name=Transformation, repinfo=null)

2019/08/22 15:37:33 - Transformation - Using run configuration [AWS EC2]

Johan Hammink's profile image
Johan Hammink

Where are the transformations stored? It is now looking in the directory /public/transformations. The variable ${Internal.Job.Repository.Directory} is pointing to that directory.

When transformation for exemple is stored in /home/admin folder you can also point to that transformation as /home/admin/trasformation name

Luis Villegas's profile image
Luis Villegas

I tried different structures, but just to make it simple to test, I have my job and transformation both in the same folder: /public/development

 

I've tried using:

  • ${Internal.Entry.Current.Directory}/transformation_name
  • ${Internal.Job.Repository.Directory}/transformation_name
  • /public/development/transformation_name

 

Regardless, the job log says it's running transformation "null" in that directory. No errors or anything.

 

Like I said before, if I create another job in that directory that simply emails me, and I make it a step for this job, the other job runs fine and emails me, but not a single transformation runs. It just doesn't make sense!

 

However, the job AND transformation run only if instead of choosing "Pentaho Server" in the run configuration, I choose "Slave Server" and I check the box for "Send Resources to Slave Server". Then it works perfectly. I'm definitely running on a Pentaho Server, so I'm not sure why only Slave Server runs. If I go to the Pentaho dashboard on the actual server (not my local spoon client) and I run the job from the dashboard, it runs the job successfully, but no transformation is run

Christopher Riccardi's profile image
Christopher Riccardi

What do you have on your Kettle Status Page for a Repository Name? It may be necessary to name the repository. For example:

 

named_repos_kettle_status

Luis Villegas's profile image
Luis Villegas

Hello Chris,

 

I have

 

Repository name: singleDiServerInstance

Christopher Riccardi's profile image
Christopher Riccardi

Hi Luis,

 

That is the default name. Sometimes, when you have work calling other work from the repository, it is necessary to give the repository a chosen name. This is done by editing the slave-server-config.xml file and choosing a name to match what you have in the repositories.xml file.

 

Note my file contents, which give me the repository name I have displayed in my screenshot.

 

Excerpt of slave-server-config.xml

<slave_config> <repository> <name>LocalOra12</name> <username>admin</username> <password>password</password> </repository>...

Excerpt of corresponding entry in repositories.xml

 

<repositories> <repository> <id>PentahoEnterpriseRepository</id> <name>LocalOra12</name> <description>Pentaho repository | http://localhost:8080/pentaho</description> <is_default>false</is_default> <repository_location_url>http://localhost:8080/pentaho</repository_location_url> <version_comment_mandatory>N</version_comment_mandatory> </repository>...

 

 

Luis Villegas's profile image
Luis Villegas

So the slave-server-config.xml is found on the pentaho server in pentaho-server/pentaho-solutions/system/kettle/slave-server-config.xml

 

For repositories.xml, I know there's one on my local machine that runs Spoon, and one on the pentaho server. Do both need to match? Are both repositories.xml relevant?

 

Thanks!

Christopher Riccardi's profile image
Christopher Riccardi

I would ensure that you have a consistent entry across all three: the client repositories.xml, the server repositories.xml, and the server slave-server-config.xml. That should cover all the execution scenarios.

 

Then stop the server, clear the server cache (\pentaho-server\pentaho-solutions\system\karaf\caches), and restart. Then check the Kettle Status Page to ensure the new name has taken.

 

Even if that doesn't rectify the problem, it's one less potential variable to deal with.

Luis Villegas's profile image
Luis Villegas

So I did everything you suggested. I can run the job remotely from the spoon client by using a Run Configuration where for settings I use "Slave Server", but the job still can't find the transformation (without any errors) if I choose "Pentaho Server" for the Run Configuration. It also doesn't successfully run if I go through the server's dashboard, or schedule the job on the dashboard. It still thinks the transformation is [null].

 

I did try something that gave me a different result, though. I created 2 Run Configurations. 1 is "Pentaho Server", the other is "Slave Server" with the slave server being the same Pentaho Server that I'm connected to and contains the repo. I set the transformation step to run using the "Slave Server" configuration, and ran the job using the Pentaho Server run config. This time it worked!

 

It seems everything works as expected only if I set all the transformation steps to run on "Slave Server", but run the job on "Pentaho Server" run config. It's weird, but it works. So I guess from now on all Transformation steps in jobs I create must be run using "Slave Server" setting for the configuration

 

Camila Barreto's profile image
Camila Barreto

Same problem. Did anyone find any solution? I am stuck with this problem because most of my processes deal with more than one transformation and I'm getting exact the same log. "Opening transformation: [null] in directory [/public]" where /public is the correct folder in my repository. So, referencing the repository seems not to be the problem...

Luis Villegas's profile image
Luis Villegas

To get it to work, I had to do the following workaround:

  • In PDI, add a new Slave Server. I did this by opening the job I'm working on, right click on "Slave Server" folder on the left hand side and select "new". Use the connection details of the Pentaho Repo you're already using
  • Create a new Run Configuration (right click "Run Configurations" in the job view and select "new")
  • For the Run Config, I used
    • Engine: Pentaho
    • Settings: Slave Server
    • Location: The slave server created above that points to Pentaho server
    • Send resources to this server: Unchecked
  • Then, in the job, go into each Transformation step, and choose the Run Configuration you created. Check the box that says "Wait for remote transformation to complete", and in the "Parameters" tab, check the box "Pass parameter values to sub-transformations"

 

Setting it up like this, I was able to run the jobs without an issue.