“New Hey Ray!” or “Hey Shin Ray!” or “Hey 新 Ray!”
(click on this link to watch a video of "Hey Shin Ray!" in action)
by Ken Wood
Back in November, Mark Hall and I shared with you a uniquely interactive demonstration that used the Plugin Machine Intelligence’s (PMI) new feature, Deep Learning with DL4j, to analyze an x-ray film through voice commands and speech responses. This original demonstration was featured at Hitachi NEXT 2018 in September in 2018 and was used to demonstrate how Pentaho could use deep learning in an application.
The apparatus used to build the original “Hey Ray!” had multiple practical functions. First, it was easy to build since we were up against a deadline for getting “Hey Ray!” up and running in time for the conference. Second, “Hey Ray!” is an example application on how to use PMI, deep learning, Pentaho, speech recognition, IoT, text-to-speech and a Raspberry Pi in a uniquely integrated way to solve a particular problem and demonstrate the power of deep learning and Pentaho. And finally, the steampunk themed design, or as I like to say “… a Jules Verne inspired artificial intelligence”, allowed us to let everyone interested in this application understand that this is not a product or solution to be purchased but an example of the simplicity of building advance solutions with Pentaho and the new Hitachi Vantara Labs plugin, PMI.
As you can imagine, there were a couple of snafus during the conference. Speech recognition in a crowded, noisy environment is challenging at best. So, to overcome the loud ambient noise, I quickly built a remote control on my iPhone to mimic the voice command set that “Hey Ray!” used. Since we used the IoT protocol MQTT, it was easy to slide in commands into the demonstration apparatus through a remote command queue. Another hiccup was the physical x-ray film. I only have 16 usable (purchased off eBay) x-ray film pictures to choose from, so it started to get a little monotonous rotating through these physical x-rays. While this isn’t a major problem and it flowed nicely with the whole interactive steampunk theme of the demonstration, x-ray film is a little too antiquated and having additional access to digital x-ray images would improve the overall experience. Lastly, the initial size and "non-slickness" of the physical apparatus and the number of components made it difficult to travel with. Since I live in San Diego, the location of the 2018 conference, it was easy for me to transport and setup this demonstration.
So, with that, I started on two major upgrades; the first was to reduce to size of the demo to something that could fit in one pelican shipping case (or smaller), and the second was to change the format of the “Hey Ray!” demonstration. I only used the first upgrade once and it was fine, but still bulky, and it still only used physical x-ray film.
Over the 2018-2019 holidays I learned Apple Swift and created an iPhone application that used the internal camera or the internal photo library as the source of x-ray images. Overall, this is so much better. I can browse the internet looking for interesting x-ray images and save them to the photo library or I can use the internal camera to take live pictures of physical x-rays for the more interactive experience we were originally looking for. The resulting analysis is still a text-to-speech response of the analysis, but currently I have dropped the speech recognition portion for the time being. However, since the Swift environment has access to the Siri framework, I could still incorporate speech recognition into the overall application later.
The iPhone application is basically a user interface to the analytics server. There is still a Pentaho server running Pentaho Data Integration (PDI) and using PMI using deep learning to analyze the incoming image. In fact, the original analytic transformation is mostly the same and the two deep learning models used to detect injuries and identify the body part being analyzed is the same as the original. A text analysis is formed with the findings and sent back to the iPhone where the results are spoken using the iPhone voice synthesizer, the same voice synthesizer that Siri uses. The iPhone based application is still able to store all analysis artifacts to a Hitachi Content Platform (HCP) system and now it includes the ingestion of custom metadata (the results of the deep learning analysis - body part and probability, and injury detection and probability), and can tweet a movie of the analysis - a rendering of the audio analysis and the x-ray image, to twitter.
I am planning to build similar iPhone applications with other datasets to use as demonstrations and as examples of how to build artificial intelligence based applications with Pentaho and PMI, and possibly other Hitachi Vantara Labs experimental prototypes. Stay tune for more...