Pentaho

View Only

PDI Production Options

Data Conversion posted 03-19-2018 11:53

Hi Everyone, I'm new to PDI and I'm a little confused about deployment. I've got transformations and jobs running locally that do what I want but I need them to run scheduled on a remote server.

I've read around this topic and just confused myself, do I need a repo? Does it have to have a database? Is this the same as Carte? Should I just get an Ubuntu VM with remote desktop and do it that way?

Apologies if I'm being slow but I can't seem to find a single source of info on the simplest way to go from running jobs on my desktop to deploying them to production.

My use case is fairly simple; it's basic ETL from OLTP databases to an AWS Redshift data warehouse, it's only me that will be setting up and running jobs and none of the jobs are particularly intensive.

If anyone can point me in the direction of the simplest method of getting up and running I'd be very grateful. Doubly so if there's a way to do it using AWS EC2.

Many thanks.

#PentahoDataIntegrationPDI
#Pentaho
#Kettle

Brandon Jackson posted 03-20-2018 20:26

A repository is just a centralized place to store your kjb and ktrs. If you have figured out a mechanism to make your real files available on the same disk as Pentaho Data Integration, then really all you need to do is use 'cron' to run ./kitchen.sh or ./pan.sh to run a job or transform at a specific time.

I would suggest a common layout for your ETL to make everything more deterministic.

project_named_directory/content <- all kjb and ktrs go here

project_named_directory/input <- all manner of flat file input placed here

project_named_directory/output <- If your ETL emits files, place them here.

project_named_directory/environment <- Any properties files or standard connection setting stuff put here to keep your PDI clean. Just read in the properties and let your JDBC connections use those variables. That will save you the hassle of mucking up your PDI /simple-jndi stuff, /home/pentaho/.kettle/kettle.properties or /home/pentaho/.kettle/shared.xml

A cron example running a job at 1 AM every day.

#minute (0-59)

# hour(0-23)

# day of the month (1-31)

# month of the year (1-12)

# day of the week (0-6 with 0=Sunday)

# commands

### Budgeted Census

0 1 * * * cd /opt/pentaho/pdi/latest/data-integration; ./kitchen.sh -file=/opt/pentaho/ETL/Build\ Budgeted\ Census\ Data/content/build_budgeted_census_data.kjb;

David Martinez posted 03-22-2018 07:50

You can also use Jenkins for job scheduling:

Open Development Notes: Pentaho Data Integration scheduling with Jenkins

Data Conversion posted 03-26-2018 07:11

Thanks guys, much appreciated.

I'll have a go with the cron method above.

Pentaho

PDI Production Options

Related Content

Pentaho 9.3 Archive installation to AWS EC2

kitchen.sh for running jobs

kitchen.sh for running jobs

Errors when running job after Update

Errors when running job via API