I have a new Pentaho 9.1 server set up, and I imported all the jobs and transformations from our Pentaho 8.2 repository. I set up the schedules through the browser using the "Manage Schedules" page, and they seemed to be running fine.
Unfortunately, after 10 jobs have run, the scheduler just stops running the jobs. The "Last Run" stays the same, but "Next Run" continues to update even though it hasn't run. There are no errors when running the jobs, the logs show "Job execution finished" every time, and I've verified that the data is being transferred.
I can kick off any job manually without issue by going into "Browse Files" and double-clicking a job. If I go into the Schedules page, select a job, and hit "Execute Now" it will not execute until I restart the server (if 10 jobs have already run).
I've been digging in on this for over 8 hours and have learned more about Quartz and how that works. It's now clear to me the issue is related to something between Pentaho and Quartz.
The 10 job limit coincides with the Quartz thread limit in the configuration. Querying the Quartz DB, I found that the `qrtz5_fired_triggers` table fills up with up to 10 rows (probably due to the thread limit config) and all jobs stay in "EXECUTING" state, even after they successfully finished. The `qrtz5_triggers` are all "WAITING".
The only way to get it up and running again is to restart the server. It sounds like there's something failing on the Pentaho end that's not updating the state in Quartz, but there are no errors in any of the logs.
I've tried recreating a job from scratch, wiping the whole schedule, and scheduling only this brand new job, but it only runs on the schedule a total of 10 times because the state still isn't updated.
Server Info:
- Ubuntu 18.04 LTS (64 bit)
- AWS EC2 instance m5.large (8gb RAM, 2 vCPU)
- PostgreSQL 10.14-R1 ( AWS RDS db.t3.large)
Any suggestions would be appreciated, thanks!
#PentahoDataIntegrationPDI#Pentaho#Kettle