Ken Wood

Announcing PMI v1.4 with Deep Learning is Here!

Blog Post created by Ken Wood Employee on Nov 14, 2018

Easy to Use Deep Learning and Neural Networks with Pentaho

By Ken Wood and Mark Hall

 

HVLabsLogo.png

Hitachi Vantara Labs is excited to release a new version of the experimental plugin, Plugin Machine Intelligence version 1.4. Along with several minor fixes and enhancements is the addition of a new execution engine for performing deep learning and executing other machine learning algorithms using neural networks. The whole mission of Pentaho and Hitachi Vantara Labs is to make complex technology simple to use and deploy, and the Plugin Machine Intelligence (PMI) is a huge advancement towards making machine learning and artificial intelligence part of this mission.

 

Back in October, I shared a glimpse of what's coming with a blog, Artificial Intelligence with Pentaho, that describes a demonstration using artificial intelligence elements. PMI and Pentaho Data Integration with deep learning is the main artificial intelligence element capability that enables that demonstration. Feel free to ask us more questions about the use of deep learning models in PDI transformations. We will also be blogging more details and "how to" about that demonstration and how to do some of those elements with PDI.

 

We call this plugin "experimental" because it is a research project from HV Labs and is released openly for the Pentaho community and users to try out and experiment with. We refer to this as "early access to advance, experimental capabilities". As such, it is not a supported product or service at this time.

 

Deep learning is a recent addition to the artificial intelligence domain of machine learning. PMI initially focuses on supervised machine learning schemes which means there is a continuous or categorical target variable that is being "learned" from a dataset of labeled training data. This deep learning integration is also a supervised learning scheme.

 

AIDomainsDiagram.png

 

The new release of PMI v1.4 can be downloaded and installed from the PDI and spoon Marketplace. If you are already running a previous version of PMI, check the installation documentation for guidance on getting your system ready for PMI v1.4. If you are not using PMI at all, the Marketplace will install the new PMI v1.4 for you. During the PMI v1.4 installation from the Marketplace, PMI will automatically install, as included machine learning engines, WEKA, Spark MLlib and Deep Learning for java (DL4j). You will need to install and setup python with the scikit-learn, and R with Machine Learning with R (MLR), machine learning libraries, at which point the installation process will configure them into PMI if they are installed and setup correctly. Again, check with the installation documentation for your system.

 

This means there are now 5 machine learning execution engines integrated in PMI for PDI providing you with many options for training, building, evaluating and executing machine learning models. PMIDLLogo.pngIn fact, some of the existing machine learning algorithms that are available for WEKA, scikit-learn, MLlib and MLR, can also execute on DL4j, like Logistic Regression, Linear Regression and Support Vector Classifier. There are also 2 new machine learning algorithms "exposed" from the scikit-learn, Weka and MLR libraries. They are the Multi-layer Perceptron Classifier and a Multi-layer Perceptron Regressor. These algorithms were exposed from the scikit-learn library to help us write some additional developer documentation on how to expose algorithms to the PMI framework.

 

Of course the most exciting part of this release is the ability to train, build, evaluate and execute deep learning models with PDI. Stated another way, the ability to analyze unstructured data with PDI. In addition, by using DL4j, you can TrainingTimes.pngtrain your deep learning models using a locally attached graphic processing unit (GPU) that is either internal to your system or externally attached, like a eGPU. DL4j uses the CUDA API from NVidia and thus only uses NVidia GPUs at this time. The speed up in training time for image processing is super fast when compared to training time on a CPU.

 

 

GPUTrainingDiagram2.png

 

 

There is a lot of reference material available to help you get started with PMI including some new installation documents to help setup PMI v1.4 and how to setup your GPU and CUDA environment for DL4j. The list of materials and references can be found at this location.

 

 

 

 

IMPORTANT NOTE:

It is important to point out that this initiative is not formally supported by Hitachi Vantara, and there are no current plans on the Enterprise Edition roadmap to support PMI at this time. It is recommended that this experimental feature be used for testing, educational and exploration purposes only. PMI is supported by Hitachi Vantara Labs and the community. Hitachi Vantara Labs was created to formally test out new ideas, explore emerging technologies and as much as possible, share our prototypes with the community and users through the Hitachi Vantara Marketplace. We like to refer to this as "providing early access to advanced capabilities". Our hope is that the community and users of these advanced capabilities will help us improve and recommend additional use cases. Hitachi Vantara has forward thinking customers and users, so we hope you will download, install and test this plugin. We would appreciate any and all of your comments, ideas and opinions.

Outcomes