Caitlin Croft

PDI Development and Lifecycle Management

Blog Post created by Caitlin Croft Employee on Feb 7, 2019

Our Customer Success & Support teams are always working on providing our customers with tips and tricks that will help our customers with the Pentaho platform.


PDI Development and Lifecycle Management


A successful data integration (DI) project incorporates design elements for your DI solution to integrate and transform your data in a controlled manner. Planning out the project before starting the actual development work can help your developers work in the most efficient way.


While the data transformation and mapping rules will fulfill your project’s functional requirements, there are other nonfunctional requirements to keep in mind as well. Consider the following common project setup and governance challenges that many projects face:


  • Multiple developers will be collaborating, and such collaboration requires development standards and a shared repository of artifacts.
  • Projects can contain many solutions and there will be the need to share artifacts across projects and solutions.
  • The solution needs to integrate with your Version Control System (VCS).
  • The solution needs to be environment-agnostic, requiring the separation of content and configuration.
  • Deployment of artifacts across environments needs to be automated.
  • Failed jobs should require a simple restart, so restartability must be part of job design.
  • The finished result will be supported by a different team, so logging and monitoring should be put in place to support the solution.


We have produced a best practices document – PDI Development and Lifecycle Management- that focuses on the essential and general project setup and lifecycle management requirements for a successful DI Solution. It includes a DI framework that illustrates how to implement these concepts and separate them from your actual DI solution and is available for download from Pentaho Data Integration Best Practices.


Guidelines for Pentaho Server and SAML


SAML is a specification that provides a means to exchange an authentication assertion of the principal (user) between an identity provider (IdP) and a service provider (SP). Once the plugin is built and installed, your Pentaho Server will become a SAML service provider, relying on the assertion from the IdP to provide authentication.


Security and server administrators, or anyone with a background in authentication or authorization with SAML, will be most interested in the Pentaho Server SAML Authentication with Hybrid Authorization document available for download from Advanced Security.