Pentaho

 View Only

 Pentaho 9.1 Scheduler Stops Running Jobs After 10 Runs

  • Pentaho
  • Kettle
  • Pentaho
  • Pentaho Data Integration PDI
Luis Villegas's profile image
Luis Villegas posted 11-03-2020 06:08

I have a new Pentaho 9.1 server set up, and I imported all the jobs and transformations from our Pentaho 8.2 repository. I set up the schedules through the browser using the "Manage Schedules" page, and they seemed to be running fine.

 

Unfortunately, after 10 jobs have run, the scheduler just stops running the jobs. The "Last Run" stays the same, but "Next Run" continues to update even though it hasn't run. There are no errors when running the jobs, the logs show "Job execution finished" every time, and I've verified that the data is being transferred.

 

I can kick off any job manually without issue by going into "Browse Files" and double-clicking a job. If I go into the Schedules page, select a job, and hit "Execute Now" it will not execute until I restart the server (if 10 jobs have already run).

 

I've been digging in on this for over 8 hours and have learned more about Quartz and how that works. It's now clear to me the issue is related to something between Pentaho and Quartz.

 

The 10 job limit coincides with the Quartz thread limit in the configuration. Querying the Quartz DB, I found that the `qrtz5_fired_triggers` table fills up with up to 10 rows (probably due to the thread limit config) and all jobs stay in "EXECUTING" state, even after they successfully finished. The `qrtz5_triggers` are all "WAITING".

 

The only way to get it up and running again is to restart the server. It sounds like there's something failing on the Pentaho end that's not updating the state in Quartz, but there are no errors in any of the logs.

 

I've tried recreating a job from scratch, wiping the whole schedule, and scheduling only this brand new job, but it only runs on the schedule a total of 10 times because the state still isn't updated.

 

Server Info:

  • Ubuntu 18.04 LTS (64 bit)
  • AWS EC2 instance m5.large (8gb RAM, 2 vCPU)
  • PostgreSQL 10.14-R1 ( AWS RDS db.t3.large)

 

Any suggestions would be appreciated, thanks!


#PentahoDataIntegrationPDI
#Pentaho
#Kettle
Luis Villegas's profile image
Luis Villegas

I verified that changing the Quartz config to have 50 threads will allow up to 50 jobs to run before needing a restart. That's only about 2 hours of jobs for us, though.

Luis Villegas's profile image
Luis Villegas

I ended up installing 8.2 onto the same EC2 server (in an adjacent folder in the home directory), and hooked the server up to all the same endpoint connections to the same DB and everything. Pentaho 8.2 ran just fine, and continued the scheduler right where it left off. I watched the Quartz DB and verified jobs were no longer getting "stuck" in the fired_triggers table.

 

I thought maybe it had to do with the configuration files, since those were the only things that were different in the documentation, but copying over the properties and settings files from 8.2 to 9.1 did nothing different, and running the 9.1 server had the same Quartz scheduling bug.

 

For now, it seems like it was a Pentaho 9.1 bug. We'll be sticking to 8.2 for the foreseeable future

Sergio Ribeiro's profile image
Sergio Ribeiro

Hello @Luis Villegas​,

 

That problem was fixed under https://jira.pentaho.com/browse/BISERVER-14534

If you're an EE user, you'll get the fix on 9.1.0.2 Service Pack.

If you're a CE user, besides upgrading to EE, your options are building from source (the issue has the link to the specific commit) or wait for the next CE version.

 

Regards,

 

Sérgio Ribeiro

Porto - Portugal