So, in all of the cases where I’ve blogged or talked about running the Plug-in Machine Intelligence (PMI) plugin for Pentaho Data Integration (PDI), it has been on a workstation like environment, basically a high-end laptop. This is more than adequate for building machine learning models and many deep learning models with interesting datasets. However, not all datasets will fit in the constrained environment of a laptop, and I don’t mean just the storage size of the datasets. Sometimes, you just need a large machine and a powerful server class GPU to process complex datasets.
In this case, we are processing a deep learning dataset that consists of 101 classes of different foods. The entire dataset is only 5.2GB in storage capacity size, which on the surface isn’t a large amount of data, even if you break the colored images into its RGB channels (3 images each) for preprocessing, that’s only about 16GB. No, the “complexity size” comes from the nature of the dataset. The Food-101 dataset contains 1,000 images per food class for a total of 101,000 images (files). Now you can also multiple this by the RGB channels and the preprocessing and processing involved in training and building a deep learning model and you start to see that maybe your laptop or workstation will be overwhelmed. Even if you could force some file caching and other mechanisms to fit the dataset within the confines of your environment, the processing time to build your deep learning models might take a while.
Here we’ll describe what happens when we have access to a Hitachi Vantara DS225 server with a lot of memory and a lot of CPUs, and an NVidia Tesla V100 GPUs. We’re also going to use PDI and PMI to do some data science work building deep learning models against this 101 class dataset. You could easily use your favorite data science tools and code your own working models, it will perform equally as well, but this is a PDI and PMI with deep learning experiment, so we’ll use PDI and PMI for this.
This project basically compares CPUs against a GPU and the actual performance (in this case speed performance) using only PDI and PMI. Very little attempt was made to tune the model for accuracy or speed. AlexNet was selected from the PMI/DL4j zoomodel to run this experiment. The only modification made was to increase the memory size for PDI/spoon to 128GB of heap memory in order to fit the dataset and processing. This was done by first running the below PDI transformation to train and test a deep learning model with PMI and deep learning for java (DL4j) without configuring the NVidia CUDA API and drivers, thus DL4j isn’t able to “see” the NVidia Telsa V100 GPU and executes the preprocessing, training and testing of the deep learning model on the CPUs and memory. In this case, there are 2 Intel 6154 Gold 18 core 3GHz CPUs for a total of 36 cores and 512GB of DDR4 RAM.
Once that model was completed, then CUDA was installed and configured for PMI and DL4j to use. In this case the NVidia Tesla V100 Core GPU is configured with 16GB of HBM2 memory and 5120 CUDA cores. The exact same transformation was executed again.
This is the transformation used for training the deep learning models.
The PMI algorithm configuration settings are very close to the defaults.
As you can see in this chart, the performance increase in time to train in minutes is dramatic. Up to 18 times faster to complete a deep learning model on a GPU compared to training a deep learning model on top of the line CPUs.
A Reference Architecture White Paper was written detailing this experiment and how all the pieces were put together and installed. This White Paper will be released in the fall timeframe to coincide with the Hitachi Vantara NEXT 2019 conference in Las Vegas.