Pentaho

 View Only

 How to set up PDI CE for production environment ?

  • Pentaho
  • Kettle
  • Pentaho
  • Pentaho Data Integration PDI
heta desai's profile image
heta desai posted 09-13-2019 12:28

I'm newbe to PDI. I want to use PDI CE in production environment. I am not getting how to setup an environment.

 

I have installed PDI CE in my local machine and designed a job in spoon IDE. Now, i want to use in production, what kind of environment do i need to setup ?

 

What will be the use of carte server ? do i need a standalone machine for It ? will it works as my production server ?

 

I want to do all the things get done in Community Edition only.


#Pentaho
#PentahoDataIntegrationPDI
#Kettle
David da Guia Carvalho's profile image
David da Guia Carvalho

First what you whants to accomplish as production env.?

A basic PDI server can be as simple as a desktop install (uncompress) and execute kitchen/pan, or as complex as a full solution with other systems involved (gitlab,jenkins,cc,ct, custom scripts, carte,etc..)

 

Some stuffs to consider:

1 - Repository (Filesystem, DB, BA);

2 - Level of separation between dev and prod (deploy);

3 - Control of execution (scheduler/remote/web);

4 - Developer access level separation per user/group;

5 - ETL access to sources (Connections);

6 - Resources avaliable at server;

 

Carte server can be used to execute jobs/trans from remote. Carte boils down to a webfrontend/service for remote executions.

 

 

Overall... KISS! If for your demands require just a simple install, do it!

heta desai's profile image
heta desai

I have different sources from which i will be getting flat files, I'm going to transform them and will load them into the DW. So, i'm really confuse about the use of Carte Server. can i deploy job on carte server to schedule the batch for production ?

Ana Gonzalez's profile image
Ana Gonzalez

OK, by your not-answer to David Carvalho you seem to be a developer working by yourself without any thought about infrastructure requirements, working in a team with other people and overall a future maintenance of what your creating right now.

 

If that is the case, you don't need a carte server, you can have your jobs and transformations in kjb/ktr files, you can upload them to the server by whichever way suits you, and schedule the execution using the schedule feature provided by the server (cron in a linux server, whatever Windows servers offer combined with power shell scripts to schedule batch executions)

You call the kitchen.sh/cmd scripts provided with the regular PDI install to program the execution of your jobs, and pan.sh/cmd script if you need to schedule the execution of a transformation. Those scripts work the same as spoon.sh/cmd, but they don't launch a window to create the jobs and transformations, they just run them.

 

If you are not working by yourself, but have to keep in mind a team or some other person to maintain your work, develop their own integration jobs with Pentaho, etc, talk with your boss or the other people in your team to answer @David Carvalho questions to see the kind of workflow you need to setup to upload development work to the production server, so you see if a Carte Server is needed.

Regards

heta desai's profile image
heta desai

I'm working in a team. We trying to explore the PDI for our upcoming project as it is new for us. The Team consist of developers as well as infrastructure handlers.

 

I do have basic requirements related to project are, everyday we will receive data (most probably flat files) at the end of the day at our data center(source in our case), we needs to process them and populate into the Data Warehouse. So, we will schedule batch execution of ETL at midnight.

 

As per my knowledge the development and production environment should be different.

 

As based on this basic requirements i wanted to understand what type of environment do we need for development as well as production.

 

We have to use Community Edition only. Let me know what functionality we must need based on requirement that we can not have in CE.

David da Guia Carvalho's profile image
David da Guia Carvalho

You have a very simple requirement, so lets keep it simple!

 

  • Development;
    • As you only process flatfiles, and dont have limitation on access to a source, Developement env. can be the dev. workstations;
    • Make your repository on filesystem (you dont need to start a real pdi rep) that way you can use ANY version control without add complex tasks (like sync repository);

 

  • Production;
    • Setup the PDI on a server;
    • Check for proper resources (cpu,mem,hd, etc..) based on your usage;
    • Sync your files (.ktr,.kjb) from your version control to your server (You can use a simple script and a prodution branch);
    • Use kettle.properties and jdbc.properties (JNDI) to setup variables and connections;
    • Create simple (shell) scripts to call jobs and transformations (with proper parameters) and save log (and/or send email of exection);
    • Use cron/control-m, or any other scheduler to execute your scripts ( jobs/trans);

 

That is it... a very simple structure!

heta desai's profile image
heta desai

Development:

  • Workstations means the client PDI which runs on each developer's local machine ?

 

Production:

  • Is there special kind of setup for server ?

 

David da Guia Carvalho's profile image
David da Guia Carvalho

1 - Yes

 

2 - Just tunning java (xmx,xms) , connections, drivers;

 

The rest is extras per project (like sync with repo, logs, notification email, etc);

heta desai's profile image
heta desai

Thank you so much David. you have cleared all my doubts.

Dave Barnett's profile image
Dave Barnett

Good question and one to which I've struggled to find any decent documentation about.

 

Basically, PDI Server can hold a central repository (or repositories) and the server can be used to schedule the jobs within them. Theres also options for analytics type stuff - creating visualisations etc.

 

The PDI client can either be used to develop code which is then pushed to the repository on the PDI server to be executed (so PDI client can be installed on a laptop etc for a developer whilst the server will be hosted on an EC2 for example in AWS). The PDI client can also be used to run jobs though, usually via the kitchen command line tool executed via cron/windows task scheduler.

 

As far as I'm aware, there is a community version of PDI server.