Hitachi's Social Innovation goals center on making a difference in the world! This vision honestly inspires awe within me because I actually work for a company that can really save lives using devices (MRI scanners and Proton Beam Therapy) fused to IT (UCP, VSP-G1000 and HCP). I fully believe that these capabilities distance us from our peer group, and our announcement today, extending our family to include Pentaho, continues a pattern of execution towards our vision. Given this new member of our portfolio, I will to explore how a user can build an Analytics Cloud supporting the fusion of IT to the Internet of Things that Matter.
The narrative starts with knowledge of developers building latency intolerant applications in virtual machines (VMs), customers deploying Hadoop in VMs, the emerging shift towards Micro-Service architectures, and some internal testing. There is a fairly clear pattern: The religion of performance at all costs is shifting towards a religion of self-directed high performance service provisioning. Why though is this possible? Let’s use an experimental result to at least answer a facet of the “why." In particular I want to reveal an internal experiment my team undertook looking at HCP running in as a VM on ESX consuming our CB500 and HUS-VM All Flash versus a more traditional HCP configuration with magnetic media. At that time we knew the future of the HCP platform was a split in function: 1. One that supports hot object caching and metadata (HCP G-nodes) and 2. Another focused on persisting content payloads via an S3 addressed Erasure Coded Object Cloud (HCP S-nodes). Therefore we wanted to understand the emergent behaviors of HCP G-Nodes in an all flash configuration. Well the simplest way for us to get started was to run the G-Nodes as VMs on ESX comparing all flash and all magnetic configurations. However, after the initial test results we stopped additional testing because untuned HCP G-Nodes in a VM with half the physical nodes and less drives had faster write performance and was about 60% the read performance of a high end all magnetic HCP system. The basic results are in the picture/table included here. I think the most startling point to me is that each HCP node required 48 drives and each VM node required 2 FMDs! My “ah ha” and a partial answer to why is that given technologies like flash a VM first approach for demanding applications is ok. Moreover given that developers run things Hadoop, MongoDB and more on their favorite public cloud infrastructures running advanced data processing engines, like PDI, encased in VM and/or Container is no problem at all. Score 1 point, or more, for fast self-directed service provisioning!
Now let's return to the primary narrative of the post, building an analytics cloud. Being informed that VM and container encased applications are becoming standard fare for advanced analytics and data processing, how can Hitachi’s users and partners embark on a journey and arrive at an analytics cloud? Well the key is in leveraging existing HUS-VM or VSP platforms with the UCP bolt-on feature resulting in private cloud platform. Essentially it is possible for our users that own VSP/HUS-VM class technologies to upgrade to a private cloud without incurring a forklift upgrade. I find this amazing because we can help our IT brothers and sisters transform their skills from storage expert to infrastructure guru through a cool but significant capability of our converged solutions. This is a distinction between us and our competition who frequently require boxes and forklifts to deploy private cloud. Now let's go back to the point on HCP in a VM... As a next step it is possible take he HCP-VM version (see HCP-VM Deployment Guide) and run it on top of UCP which literally and figuratively adds S3 object support to an on premises private cloud. The inclusion of HCP-VM is important for two reasons: 1. Applications written for back-up to and consumption of cloud storage using S3 can easily target this Software Defined object storage controller, and 2. Hadoop can also use S3 to persist data into an object cloud with advanced features. (Check out a great paper out on the HDS Community which discusses Using HCP with Hadoop.)
The final step is to add a tool that can integrate, analyze and visualize data by controlling Hadoop and potentially other analytic engines. Fortunately, we’re acquiring Pentaho who's Data Integration Studio and Business Analytics kit can run top of UCP in a VM for these very purposes. With this done users can then imagine and implement a veritable smorgasbord of analytics use cases. For example in If you Tweet in the US… Ken Wood blogged about using the Pentaho kit to uncover interesting tidbits from Twitter.
Since Pentaho can dynamically synthesize various metadata schemas it is possible to combine say an XML document on HCP with data analyzed by Hadoop, mixed up with a SQL DBMS like MySQL, etc. Essentially the possibilities are endless and the journey spelled out in this narrative illustrates how it is possible to move from merely block storage to full-blown analytics cloud supporting Social Innovation through the Internet of Things that matter.
So to net out the steps they are as follows:
- Starting with an existing or new VSP or HUS-VM upgrade the system from storage platform to on premises private cloud via UCP’s bolt-on capability. Along the way we can help the users graduate from storage experts to full on infrastructure guru.
- Continue the discussion about an on-prem cloud by adding HCP-VM on top of UCP and initiate discussions about an emerging trend in the public cloud (e.g. AWS, etc.) running infrastructure and analytics engines in VMs because there really is negligible performance impact. This is exactly the possibility that we can bring to the table with HCP on UCP and...
- Once more return to talk about Hadoop as an advanced analytics engine potentially bringing in our friends from Hortonworks. An alternative is to merely download Apache Hadoop for folks to integrate it to HCP -- remember that term one platform for all data well here is a manifestation of it albeit in a virtual context.
- Finally return and engage with the Pentaho team to give them the full analytics cloud experience from imagination to discovery, data integration, analysis and visualization.
In many respects this journey is a key puzzle in achieving our future now and tomorrow. As to what kinds of applications can be built on top of these infrastructures, on Sara Gardner's Blog she's written a companion to this post peaking behind the curtains uncovering how the Pentaho Community leverages Data Blending, and Greg Knieriemen discusses the Open Source effect and Pentaho's participation there in the era of Advanced Analytics.
- Hadoop on Coffee Drink - Author: Flickr: yukop's Photostream, License: Creative Commons — Attribution-ShareAlike 2.0 Generic — CC BY-SA 2.0
- Software Package Icon - Author: Rogerio de Souza Santos, License: The GNU General Public License v3.0- GNU Project - Free Software Foundation, URL: H2O Icon Theme KDE-Look.org
- Advanced Network Image - Author: Flickr: Marc_Smith's Photostream, License: Creative Commons — Attribution 2.0 Generic — CC BY 2.0