Skip navigation

Easy to Use Deep Learning and Neural Networks with Pentaho

By Ken Wood and Mark Hall

 

HVLabsLogo.png

Hitachi Vantara Labs is excited to release a new version of the experimental plugin, Plugin Machine Intelligence version 1.4. Along with several minor fixes and enhancements is the addition of a new execution engine for performing deep learning and executing other machine learning algorithms using neural networks. The whole mission of Pentaho and Hitachi Vantara Labs is to make complex technology simple to use and deploy, and the Plugin Machine Intelligence (PMI) is a huge advancement towards making machine learning and artificial intelligence part of this mission.

 

Back in October, I shared a glimpse of what's coming with a blog, Artificial Intelligence with Pentaho, that describes a demonstration using artificial intelligence elements. PMI and Pentaho Data Integration with deep learning is the main artificial intelligence element capability that enables that demonstration. Feel free to ask us more questions about the use of deep learning models in PDI transformations. We will also be blogging more details and "how to" about that demonstration and how to do some of those elements with PDI.

 

We call this plugin "experimental" because it is a research project from HV Labs and is released openly for the Pentaho community and users to try out and experiment with. We refer to this as "early access to advance, experimental capabilities". As such, it is not a supported product or service at this time.

 

Deep learning is a recent addition to the artificial intelligence domain of machine learning. PMI initially focuses on supervised machine learning schemes which means there is a continuous or categorical target variable that is being "learned" from a dataset of labeled training data. This deep learning integration is also a supervised learning scheme.

 

AIDomainsDiagram.png

 

The new release of PMI v1.4 can be downloaded and installed from the PDI and spoon Marketplace. If you are already running a previous version of PMI, check the installation documentation for guidance on getting your system ready for PMI v1.4. If you are not using PMI at all, the Marketplace will install the new PMI v1.4 for you. During the PMI v1.4 installation from the Marketplace, PMI will automatically install, as included machine learning engines, WEKA, Spark MLlib and Deep Learning for java (DL4j). You will need to install and setup python with the scikit-learn, and R with Machine Learning with R (MLR), machine learning libraries, at which point the installation process will configure them into PMI if they are installed and setup correctly. Again, check with the installation documentation for your system.

 

This means there are now 5 machine learning execution engines integrated in PMI for PDI providing you with many options for training, building, evaluating and executing machine learning models. PMIDLLogo.pngIn fact, some of the existing machine learning algorithms that are available for WEKA, scikit-learn, MLlib and MLR, can also execute on DL4j, like Logistic Regression, Linear Regression and Support Vector Classifier. There are also 2 new machine learning algorithms "exposed" from the scikit-learn, Weka and MLR libraries. They are the Multi-layer Perceptron Classifier and a Multi-layer Perceptron Regressor. These algorithms were exposed from the scikit-learn library to help us write some additional developer documentation on how to expose algorithms to the PMI framework.

 

Of course the most exciting part of this release is the ability to train, build, evaluate and execute deep learning models with PDI. Stated another way, the ability to analyze unstructured data with PDI. In addition, by using DL4j, you can TrainingTimes.pngtrain your deep learning models using a locally attached graphic processing unit (GPU) that is either internal to your system or externally attached, like a eGPU. DL4j uses the CUDA API from NVidia and thus only uses NVidia GPUs at this time. The speed up in training time for image processing is super fast when compared to training time on a CPU.

 

 

GPUTrainingDiagram2.png

 

 

There is a lot of reference material available to help you get started with PMI including some new installation documents to help setup PMI v1.4 and how to setup your GPU and CUDA environment for DL4j. The list of materials and references can be found at this location.

 

 

 

 

IMPORTANT NOTE:

It is important to point out that this initiative is not formally supported by Hitachi Vantara, and there are no current plans on the Enterprise Edition roadmap to support PMI at this time. It is recommended that this experimental feature be used for testing, educational and exploration purposes only. PMI is supported by Hitachi Vantara Labs and the community. Hitachi Vantara Labs was created to formally test out new ideas, explore emerging technologies and as much as possible, share our prototypes with the community and users through the Hitachi Vantara Marketplace. We like to refer to this as "providing early access to advanced capabilities". Our hope is that the community and users of these advanced capabilities will help us improve and recommend additional use cases. Hitachi Vantara has forward thinking customers and users, so we hope you will download, install and test this plugin. We would appreciate any and all of your comments, ideas and opinions.

In addition to the LiDAR Motion Sensor real-time data feed from the 8th floor lobby of the HLDS facility, we've added another sensor to the configuration. The new real-time sensor data PMDustSensor.pngcomes from a prototype sensor that is being developed by the same LiDAR Hitachi LG Data Systems (HLDS) development team. This sensor is a Particulate Matter sensor, or dust sensor. We thought it would be an interesting combination of sensor data to detect human traffic AND the amount of dust or particles being "kicked up" from this traffic. The lobby is a carpeted area.

 

 

DustSensor8thFloorLobby.pngIn Korea, there is an increasing concern with particulate matter and pollution in the environment PMStandard.pngcoming from their neighboring country. This new sensor allows monitoring of air quality by the detection of particulate matter. There is a Particulate Matter, or PM, standard for defining dust in the air. While the eventual sensor device will be used both indoor and outdoor, today we are deploying the sensor indoor and making the data from this sensor available to everyone to analyze. In the future, we will deploy an outdoor sensor to monitor the air pollution in the city of Seoul.

 

The PM sensor data uses MQTT to publish its data. The real-time data feed can be accessed at the following MQTT broker and topic.

 

PLEASE NOTE:

There is a problem with the original broker and we have moved this
data stream to a new broker. Please note the new broker URL below.
Sorry for the inconvenience.

 

 

Broker location - tcp://mqtt.iot-hlds.com:1883

 

Topic - hlds/korea/8thFloor/lobbyDust

 

 

The data streamed from this sensor is a json formatted message that has the following definition,

 

  • Event: AirQuality - the event type
  • Time: TimeStamp - time of the sample
  • PM1_0: Particulate Matter at 1 micrometer and smaller - quantity of sample
  • PM2_5: Particulate Matter at 2.5 micrometer and smaller - quantity of sample
  • PM10: Particulate Matter at 10 micrometer and smaller - quantity of sample

 

Here is a screen shot of MQTT Spy inspecting these messages.

 

MQTTSpyDustSensor.png

What kind of Pentaho transformation, dashboards and analysis can you create with this data? is there a correlation of human traffic through the lobby and the amount of dust detected? We want to see your creations. Please share your work in the comments are below, or write-up your own blog and share it with us. Who knows, there might be something in it for you.

There are currently 3 Installation Guides to accompany the Plug-In Machine Intelligence (PMI) plug-in and one Developers Guide. Also, the demonstration transformations and sample datasets are available. These sample transformations and sample datasets are for demonstration and educational purposes. They are downloadable at the following,

 

Download Link and Document Name
Description
PMI_1.4_Installation_Linux.pdfInstallation guide for the Linux OS platform.
PMI_1.4_Installation_Windows.pdfInstallation guide for the Windows OS platform.
PMI_1.4_Installation_Mac_OSX.pdfInstallation guide for Mac OS X platform.
PMI_Developer_Docs.pdfA developer's guide to extending and contributing to the PMI framework.
PMI_MLChampionChallengeSamples.zipThis zip file contains all of the sample transformations, sample folder layouts and datasets for running the Machine Learning demonstrations and the Machine Learning Model Management samples. This is for demonstration and educational purposes.
PMI_AddingANewScheme.pdfThis documents describes the development process of exposing the Multi-Layer Perceptron (MLP) regressor and classifier in the Weka and scikit-learn engines.

REAL! Real-time IoT data stream available for Pentaho Analysis and Visualization

Everyone knows how hard it is to get access to real-time data feeds. Well, here is a chance to access real-time data using a 3D LiDAR motion sensor.

 

 

HLDS8thFloorLobby.png

 

There has been a lot of talk about the new 3D LiDAR (Light Radar) motion sensor from Hitachi LG Data Systems LiDARs2.png(HLDS) recently. The 3D LiDAR is a Time of Flight (ToF) motion sensor that calculates distance by measuring the time it takes for an infrared laser to emit light and receive the reflection back. Because it measures a pixel-by-pixel image via the sensor, it shows the shape, size and position of a human and/or an object in 3D at 10 to 30 fps (frames per second), so it is possible to detect and track the motion, direction, height, volume, etc. of humans or objects.

 

Unfortunately, general access to this sensor it a bit difficult to come by at the moment and setting one up in a useful location, like a bank, retail store or casino, is also a challenge. So, in a partnership with HLDS, we have setup a LiDAR configuration at a company lobby on the 8th floor at HLDS in Seoul South Korea and will make the real-time output stream available to Hitachi Vantara Pentaho developers to use and develop to. The real-time data stream will be published from an MQTT broker at,

 

PLEASE NOTE:

There is a problem with the original broker and we have moved this
data stream to a new broker. Please note the new broker URL below.
Sorry for the inconvenience.

 

Broker location – tcp://iot.eclipse.org:1883 tcp://mqtt.iot-hlds.com:1883

Topic – hlds/korea/TOFData

 

 

An example .json formatted data record published from this broker and topic looks like this,

 

MQTTStream.png

 

The data stream will be published in clear text. The data is not sensitive. We are looking for real-time dashboards, visuals, analytics and integration transformations.

 

To help start this off, there is a collection of transformations to start from here.

 

 

LiDARLobbyView.png

 

The setup scenario is a “Human Direction Detection” challenge using the filter processor "Human Counter Pro". There are two zones being monitored by the 2 ceiling mounted LiDARs (the two LiDARs are grouped together to cover the wide area). The first zone is the entrance area called “entrance” and the second zone is the lobby area called the “hallway”. What can be happening in this configuration scenario is that,

 

  • People arrive (out of the elevator) and enter the “entrance” area, then they enter the “hallway” area, and are either walking towards the South Wing doorway or the North Wing doorway. This is the most common scenario and is basically employees arriving on their floor and heading to their work area.
    • This scenario can also happen in reverse order where people enter in the "hallway" from either the North Wing or South Wing and enter the "entrance" signifying leaving.
  • Someone enters and stays in the “hallway” for a period of time. Someone or others arrive in the entrance area and the group heads to one of the doorways. This scenario is basically an employee waiting for visitors to be escorted to a meeting or other activity.
  • Someone or a group crosses the “hallway” from the South Wing to the North Wing, or from the North Wing to the South Wing. This is a scenario where people are crossing over from one side of the building to the other side.
  • Someone enters the “hallway” area and stays there for a period of time, then heads to one of the doorways. In this scenario, someone is probably looking at one of the live demos or items in the lobby’s display area.
  • There could be other scenarios that you can identify with the data from the LiDARs, these are just a few that we came up with.

 

 

HumanCounterProDiagram.png

 

 

The published data stream will have identified and tracked people as they move into the “entrance” area and then move to the “hallway” area. Timing information of when each person enters (Appear) in the zones and when they leave (Disappear) the zone. Duration time in the zones area will need to be calculated yourself.

 

Lastly, remember South Korea is 16 hours ahead of pacific time, so the work day and work week activity is very skewed. It will be busy in the evening pacific time, and it will be the weekend on Friday pacific time.

 

You can use a MQTT inspection tool like "MQTT Spy" to explore and examine the data coming from the sensor.

 

MQTTSpyScreenShot.png

 

Some background

 

Originally, this was going to be setup for me, then it was discussed that since this is an MQTT design, we can open this up company wide. Access to real world IoT data is hard to come by.

 

There are other Processor Filters in the LiDAR device middle-ware suite that provide different functions from the sensor. We are starting with the Human Counter Pro because this one publishes via MQTT. If this is successful, the other Processor Filters will also be integrated with MQTT as a simple mechanism for integrating Pentaho to the LiDAR sensor, and future physical sensors and Processing Filters.

 

No special plugin development is required to integrate to a state-of-the-art motion sensor to Pentaho. We’ve had access to MQTT steps for PDI for a few years now. There are a few blogs in the Vantara Community here and here describing how to use MQTT with Pentaho.

 

Some analysis ideas,

 

  • How many people entered the “entrance” only and then “Disappeared” (wrong floor?)?
  • How many people exited from “entrance”?
  • How many people went to North Wing?
  • How many people went to South Wing?
  • How many people crossed the “hallway”?
  • How long did people stay in the “hallway”?
  • Most people in the “hallway” at what times of the day?
  • Does the time of day matter?
  • What reports, visuals, dashboards and/or real-time dashboards can be created from this data?

 

Please share what you come up with in the comments section and/or submit your own write-up or blog. Who knows, there might be some recognition in it for you. Enjoy!