Plugin Machine Intelligence Version 1.5 is Available!
The Plugin Machine Intelligence version 1.5 is now available for download and use from the Pentaho Marketplace from Hitachi Vantara Labs. Version 1.5 includes several key features and add-ons including,
You can install the PMI v1.5 plugin, as well as the PMI Visualization plugin (see this blog for more details), directly for the Marketplace in your spoon environment. It is recommended that you manually delete the older PMI version from your PDI plugins/steps folder before installing this new version.First and foremost, a new PMI engine, Keras/Tensorflow has been integrated into PMI. That’s a total of 6 Machine Learning engines built into PMI to do supervised machine learning and deep learning.
- TensorFlow support for Deep Learning and other Machine Learning algorithms
- Transfer learning
- eXtreme Boosting Classifier and Regressor algorithm
- Spark MLlib 2.4
- And as part of the PMI data science suite, PMI Visuals for data exploration
- Spark MLlib
- R - MLR
- Python – Scikit-Learn
- Deep Learning for java
So now DL4j and Tensorflow, plus transfer learning, can leverage properly configured graphical processor units (GPU). However, at the time of this blog, Tensorflow GPU support on OSX is not supported and doesn’t work. Hopefully, Google and Apple will see fit to fix this oversight and at such time, we will test PMI v1.5 on the combination.
By integrating Keras/Tensorflow into PMI comes the ability to do “Transfer Learning”. From Wikipedia – “Transfer learning (TL) is a research problem in machine learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem. For example, knowledge gained while learning to recognize cars could apply when trying to recognize trucks.”. We’ll cover TL in more detail in a later blog.
A new algorithm has also been added to the PMI palette – eXtreme Boosting for classification and regression. This is an exciting new algorithm in that this algorithm has the unique distinction of being able to train and infer on a GPU. Typically, classical machine learning algorithms operate on structured data – rows and fields, while deep learning algorithm operated on unstructured data – images (though there is some work being done to leverage deep learning on structure data problems, more on this in a later blog). The eXtreme Boosting algorithm operates on structured data AND enables you to train your model on a GPU and if you trained this model on a GPU, the model will infer or execute on a GPU. So, basically you now have the ability of using machine learning on your structured data with hardware acceleration. We plan on testing use cases with this new algorithm on large datasets to compare its accuracy to other algorithms and to compare the performance (speed) gains on GPU. Stay tune for future blogs on this one.
Also, support and use of Spark MLlib release 2.4 is provided. As with the previous version of Spark, this runs on your local system.
Finally, a new capability for visualizing your data is [art of the PMI Suite of data science tools. PMI Visualization is a separate plugin from the Marketplace. You can read more about this new plugin at this blog.The easiest way to get started using Keras/Tensorflow is to install Anaconda version 3.7 for your platform of choice. You can use other python distributions, either python 2 or 3, but we have found the just starting with this Anaconda version is the quickest way to get started. Other steps needed to set yourself up with PMI and Keras/Tensorflow is after installing Anaconda, enter these commands,
1. conda init
The anaconda installation process will ask if you want to run this, or you can enter this yourself.
2. conda install tensorflow==1.13.1
This is the version that we found to works the best. There is a bug in Tensorflow 1.14 and we will be testing Tensorflow 2.0 soon.
3. conda install keras==2.2.5
This is the combination of Anaconda 3.7, tensorflow 1.13.1 and keras 2.2.5 that seems to be the most stable for us.
The updated installation guides for installing PMI v1.5 can be viewed from here for your platform. These installations guides provide a more detail description on how to set up your environment for all the PMI ML Engines.
There is also an updated developer’s guide.
And finally, there is a new sample dataset and sample PDI transformation for deep learning called “10-monkeys” from Kaggle, and the champion/challenger example transformation and dataset. They can be access from,