Skip navigation

“New Hey Ray!” or “Hey Shin Ray!” or “Hey Ray!”

HeyShinRayVideoDemo3 - YouTube

(click on this link to watch a video of "Hey Shin Ray!" in action on youtube)

 

 

by Ken Wood

 

HVLabs_RdI_Logo.pngThe word 'shin' means 'new' in Japanese

 

Back in November, Mark Hall and I shared with you a uniquely interactive demonstration that used the Plugin Machine Intelligence’s (PMI) new feature, Deep Learning with DL4j, to analyze an x-ray film through voice commands and speech responses. This original demonstration was featured at Hitachi NEXT 2018 in September in 2018 and was used to demonstrate how Pentaho could use deep learning in an application.

 

The apparatus used to build the original “Hey Ray!” had multiple practical functions. First, it was easy to build since we were up against a deadline for getting “Hey Ray!” up and running in time for the conference. Second, “Hey Ray!” is an example application on how to use PMI, deep learning, Pentaho, speech recognition, IoT, text-to-speech and a Raspberry Pi in a uniquely integrated way to solve a particular problem and demonstrate the power of deep learning and Pentaho. And finally, the steampunk themed design, or as I like to say “… a Jules Verne inspired artificial intelligence”, allowed us to let everyone interested in this application understand that this is not a product or solution to be purchased but an example of the simplicity of building advance solutions with Pentaho and the new Hitachi Vantara Labs plugin, PMI.

HeyRayEvolution.png

As you can imagine, there were a couple of snafus during the conference. Speech recognition in a crowded, noisy environment is challenging at best. So, to overcome the loud ambient noise, I quickly built a remote control on my iPhone to mimic the voice command set that “Hey Ray!” used. Since we used the IoT protocol MQTT, it was easy to slide in commands into the demonstration apparatus through a remote command queue. Another hiccup was the physical x-ray film. I only have 16 usable (purchased off eBay) x-ray film pictures to choose from, so it started to get a little monotonous rotating through these physical x-rays. While this isn’t a major problem and it flowed nicely with the whole interactive steampunk theme of the demonstration, x-ray film is a little too antiquated and having additional access to digital x-ray images would improve the overall experience. Lastly, the initial size and "non-slickness" of the physical apparatus and the number of components made it difficult to travel with. Since I live in San Diego, the location of the 2018 conference, it was easy for me to transport and setup this demonstration.

 

So, with that, I started on two major upgrades; the first was to reduce to size of the demo to something that could fit in one pelican shipping case (or smaller), and the second was to change the format of the “Hey Ray!” demonstration. I only used the first upgrade once and it was fine, but still bulky, and it still only used physical x-ray film.

 

Over the 2018-2019 holidays I learned Apple Swift and created an iPhone application that used the internal camera or the internal photo library as the source of x-ray images. Overall, this is so much better. I can browse the internet looking for interesting x-ray images and save them to the photo library or I can use the internal camera to take live pictures of physical x-rays for the more interactive experience we were originally looking for. The resulting analysis is still a text-to-speech response of the analysis, but currently I have dropped the speech recognition portion for the time being. However, since the Swift environment has access to the Siri framework, I could still incorporate speech recognition into the overall application later.

NewArchitecture.png

The iPhone application is basically a user interface to the analytics server. There is still a Pentaho server running Pentaho Data Integration (PDI) and using PMI using deep learning to analyze the incoming image. In fact, the original analytic transformation is mostly the same and the two deep learning models used to detect injuries and identify the body part being analyzed is the same as the original. A text analysis is formed with the findings and sent back to the iPhone where the results are spoken using the iPhone voice synthesizer, the same voice synthesizer that Siri uses. The iPhone based application is still able to store all analysis artifacts to a Hitachi Content Platform (HCP) system and now it includes the ingestion of custom metadata (the results of the deep learning analysis - body part and probability, and injury detection and probability), and can tweet a movie of the analysis - a rendering of the audio analysis and the x-ray image, to twitter.

 

I am planning to build similar iPhone applications with other datasets to use as demonstrations and as examples of how to build artificial intelligence based applications with Pentaho and PMI, and possibly other Hitachi Vantara Labs experimental prototypes. Stay tune for more...

Blog 1 of 2

AllTweets.jpg

 

I have a few pent up blogs that need to be written, but I’ve been waiting for a couple of Pentaho User Meeting speaking events to finish before I share them. I don’t want to "steal my own thunder" if you know what I mean.

 

I did start tweeting some thoughts the other day on one of these viewpoints – hidden results from machine learning models. Which led me to start writing this first of 2 blogs on the matter.

 

Much of this idea comes from the “Hey Ray!” demonstration that uses deep learning in Pentaho with the Plugin Machine Intelligence plugin. In this demonstration, occasionally a “wrong” body part would be identified. I say wrong, because I personally see a “forearm” but “Hey Ray!” sees an “elbow”. In working with this example application and the interpretation of the results, it began to dawn on me, “maybe the neural networks in this transformation does 'see' a forearm”. Technically, both answers, forearm and elbow, are correct. Then you have to ask yourself, “what is the dominant feature of the image being analyzed” or more importantly, “what image perspective is being analyze?”. When looking at the results from the deep learning models used, you can see that both classes score high as a probability output. So what are you supposed to do with no "clear" result?

 

ElbowForearm.jpg

 

This is where hidden results come in. In my talks, I typically describe starting and using machine learning on problems and datasets that can be described with 2 classes or binary classes, "Yes or No", "True or False", "Left or Right", "Up or Down", and so on. Ask the question about the data for example, “is this a fraudulent transaction, Yes or No?”. With supervised machine learning, you have the answers from historical data. You are just training the models to recognize the patterns in the historical data to recognize future fraudulent transactions in live or production data.

 

Here's the issue, there could actually be three possible answers from binary classes – "Yes, No or Unknown". However, you have to interpret the Unknown answer with thresholds and flag the implied Unknown results. Typically, the Type:Yes_Predicted_prob outcomes from a machine learning model (assuming you are using PMI with Pentaho) are based on the halfway point – 0.5 or 50%. Any prediction above the 50% line is the Yes class and any prediction below the 50% line is the No class. This means that a Type:No_Predicted_prob at 0.49 or 49%. These statistical “ties” need to be contained in a range and this range needs to be defined as your policy.

 

For example, you could test and retest your models and define a range of results for Type:Yes_Predicted_prob or Type:No_Predicted_prob and the extrapolated Unknown results. The predicted results would be,

 

  • 4 predicted “yes” diabetic,
  • 4 predicted “no” diabetic
  • and 3 “unknown” or “undetermined” predictions.

 

The figure below shows the results of a PMI Scoring step of last 11 case study records from the Pima Indians diabetes dataset study.

 

NOTE:

For experimental purposes, of the 768 records in the diabetes case study dataset, I split the dataset into 3 separate datasets; a training dataset, a validation dataset and “production" dataset. For the production dataset, I use the last 11 records from the dataset and remove the “outcome” column (the study’s results) and score this dataset for the results you see below.

 

ResultsInterpretation2.png

ResultsInterpretationKTR.png

 

This example illustrates that any result predicted with a probability score between 0.30 to 0.70 (30% to 70%) should be interpreted as Unknown. While to top 30% and Botton 30% ranges should be used to define you Yes or No results. The results will need to score a predicted Yes of 0.70 or higher for the results to be interpreted as a Yes, and a score of 0.30 or lower to be interpreted as a No, all other results should be considered Unknown.

 

If you only used the type_MAX_prob to determine the outcome, the results would be,

 

  • 5 predicted “yes” diabetic,
  • 6 predicted “no” diabetic

 

So, use thresholds and ranges to define your results from machine learning models. Do not just go by the type_MAX_prob, Type:No_Predicted_prob or Type:Yes_Predicted_prob. By doing so, you will discover hidden results in binary classes.

 

Hidden results are more pronounced when working with datasets with more than 2 classes. Multi-class datasets may actually include combinations or answers as well as unknown results. This will be the subject of the next blog. Stay tuned!

PMIDLLogo.png