Skip navigation
1 2 3 Previous Next

Storage Systems

43 posts

New World.jpg

According to Wiki, a “Creator” is, “Something or someone who brings something into being”. These days with all the digital technology and tools, it’s never been a better time to be a Creator.  With businesses working to be the next disrupter, there have never been more tools available to accelerate innovation. Every so often a “process” change is so significant that it becomes much more than that, it becomes a broad sweeping movement or even a culture. That is precisely what is happening with DevOps. For those who don’t know DevOps, at it’s core it enables business to accelerate the development and release of products by applying agile and lean principles to software development.  The rate of business change is main driver for DevOps and that’s why its adoption has spread like wildfire. The movement started with small companies and is now driving process and cultural change in large enterprises.  But it’s not all about DevOps per se, the tools available to software developers and IT engineering enable DevOps to develop, maintain, and release new products and features at a dizzying rate. Let’s discuss some of the most significant tools that are literally changing how software development is done and how IT environments are deployed.


DevOps benefits include:

  • Overall accelerated business innovation
  • Deploy with a process that can scale, and is both repeatable and reliable
  • Integrated process for capturing and measuring data points on quality
  • Built-in propagation of process benefits
  • Ownership of both development and operations which extends to the production environment and customer experience


There’s no “Me” in “TEAM”…

Uh… well there may be “me” in “team”, but there’s also an “us” in DevOps : )  Remember its not just a process, it’s a culture! And its never been easier to be part of the “US”, this is because everyone can work on their own code or module and check-in at any time.  Consequently, the DevOps model is flexible enabling updates and enhancements at any point in this perpetual process.  Think of the development team as a collective driving higher synergy through interaction.


Tools of the Trade - Containers Enabling DevOps


There is a huge amount of hype around “Containers”, heralded as the virtual machine of the new millennia. There’s good reason for the excitement as Containers offer several huge benefits, like the ability to run the application in a lightweight “container” totally agnostic of the underlying OS.  But that’s not all.


Containers are super tools for DevOPs for the following reasons:

  • Velocity – Containers spin up faster (Average of 500ms compared to 2-7 minutes for a virtual machine) enabling the developer to test and retest quickly
  • Easy - Simplified, quicker code integration of multiple modules
  • Accelerated delivery pipeline and code release process
  • Develop one version, run it on anything, anywhere


OpenStack and DevOps
Transparency, sharing, and open exchange is embedded into the DevOps culture and process. It’s no surprise that OpenStack is finally finding its place within DevOps.  OpenStack is maturing and has developed a fast-growing ecosystem. It enables the DevOps community by providing an open, affordable and stable platform upon which to deploy “containerized” apps. OpenStack offers robust data services and leverages open API’s for simple standardized application development. The OpenStack block storage service is called, “Cinder” and a growing number of enterprises are adopting it as part of their DevOps repertoire.

So as this is a “storage” related blog, you may be asking how enterprise storage play’s in to support containers, OpenStack and the overall DevOps initiative.  Well, here’s where the rubber meets the road.


What about Storage for Containers?
Good question.  Container are light and agile.  Agile meaning that they can be spun up and deleted so quickly that they have a disposable, “ephemera-like” quality.  This is fine for cloud native applications delivering search results and then deleting the container, but what about traditional applications like databases which require “persistent” storage?  Luckily storage vendors like Hitachi Vantara offer plugins to their storage OS to enable a persistent connection between the container and the storage. The Hitachi Storage Plug-in for Containers provides connectivity between Docker containers and Hitachi VSP storage platforms. With the plug-in, Hitachi customers can deliver shared storage for Docker containers that persists beyond the timeline of a single Docker host. This is essential for enabling DevOps and agile IT. It also makes stateful application development available for container-based workloads. With a persistent connection to the storage, containers can be protected with high availability, replication, and snapshots. As containerized apps find themselves into mission critical applications, enterprise-class data protection capabilities will be required.


What about Storage for OpenStack?

As mentioned earlier, OpenStack Cinder provides a REST API to exchange block data to the storage. Leading storage providers like Hitachi Vantara offer a driver like the Hitachi Block Storage Driver Plugin to enable enterprises to leverage their existing storage for OpenStack. The benefits are similar to the container driver in that it opens up rich storage services for OpenStack based applications.


The Takeaway – Roll with the Changes

So, keep DevOps on your radar, in fact you may want to get in front of the wave by initializing some cultural and process changes before your competitor does.  Luckily, Hitachi Vantara is here to help you leverage storage solutions to support DevOps and help you win the game.  So go ahead, make some DevOps noise and disrupt your competition.


Learn About DevOps

Paula Phipps’ Blog - The Super Powers of DevOps to Transform Business


More great blogs in the Data Center Modernization Series here:

Hu Yoshida's blog - Data Center Modernization - Transforming Data Center Focus from Infrastructure to Information

Nathan Muffin's blog – Data Center Modernization

Mark Adams's blog - Infrastructure Agility for Data Center Modernization

Summer Matheson's blog - Bundles are Better

Richard Jew's blog - AI Operations for the Modern Data Center


We’ve all heard the cliché, “If it sounds too good to be true, then it’s probably not true”.  In many situations this turns out to be the case, but I’m not writing this blog to throw shade at data reduction.  Actually, data reduction is an amalgamation of super useful technologies like compression and deduplication which yield additional “effective” storage capacity out of your existing usable capacity. This proposition is especially attractive when it comes to all-flash storage because data reduction can significantly lower your cost per GB.  But there are questions you should be asking and that’s what I want to call out.  As a starter, it is important to understand the definitions used in data reduction marketing jargon so here’s a primer.

  • Data Reduction – Compression and deduplication technologies
  • Total Efficiency – Compression, deduplication, snapshots, thin provisioning
  • Raw Capacity – Total disk space available to the array
  • Physical Capacity – Capacity available after formatting the media
  • Usable Capacity - Physical capacity available after factoring RAID data protection overhead and spare drive capacity
  • Effective Capacity – Useable capacity available after deduplication and compression is applied
  • Free Capacity - Unassigned space in a volume group or disk pool that can be used to create volumes


So you may be thinking that if compression and deduplication is the basis for data reduction, is one vendors compression and deduplication better than another’s? Flash vendors use very similar compression and deduplication technology.  For example, all vendors use the same LZ77 bit compression algorithm as this technology is proven and patented. Deduplication schemes don’t vary much as the premise is the same, to identify patterns and eliminate duplicate copies of data.  A pointer to the original data is inserted where a duplicate exists.  Sooo… this leads to the question, “Won’t all vendors results be the same if they all use the same compression and deduplication technology for data reduction?  The answer is yes, if not quite the same, the results yielded will be very similar.


4 key considerations when evaluating data reduction:


24:1 is the most baby!  But this “bug” isn’t going to win any races…


The first is performance or overall system IOPS.  If you have already settled on an all-flash solution there’s a good chance that the reason why was to deliver better performance and quality of service to your customers and the business.  Regardless of what vendors claim, compression and deduplication can adversely affect system performance because those data reduction operations need to be handled either in silicone or in software.  This is called the “data-reduction tax”.  Now how big a “tax” you pay is directly related to how efficiently the vendor has implemented the solution. Moreover, everyone’s environment and use case is different so results can vary widely regardless of vendor claims and guarantees.


The second big ticket item to be consider is the type of data are you going to be reducing.  The fact of the matter is that some data types compress very well and other don’t at all. For example, remember the first time you tried to “zip” a PowerPoint presentation because the darn email system wouldn’t allow attachments over 10MB.  After swearing and cussing at Outlook you thought that zipping that big honkin presentation would be the answer.  Then you learned that zipping that .ppt got you nothing!  That was a rookie move.   What you should know is that that databases and VDI data compress very well.   Audio, video, and encrypted data doesn’t compress well at all so there would be very little data reduction benefit on those data types.  The point I am making is that you should be aware that your data reduction benefit is going to vary directly based on your data type.


The third item I’m going to ask you to consider is does your vendor give you a choice to configure your storage volumes with or without data reduction?  If the answer is “no”, you should be concerned and here’s why.  In a “data reduction is always on” scenario you don’t have the choice or ability to balance data reduction with performance. This may be fine if your application can tolerate the latency inherent in a “data reduction always on scenario”, but in most cases all-flash arrays are purchased to break performance barriers, not introduce them. I must point out that with the new Hitachi VSP Series and Storage Virtualization Operating System RF, the user has a choice to balance flash performance and efficiency right down to the LUN Level.  The result is a bespoke balance of IOPS and data efficiency tuned perfectly for each individual environment.


Four is probably the most important attribute that few vendors are willing to discuss.  Is the data reduction guarantee or claim backed up by an availability guarantee? Do you know that “availability” is the number one selection criteria when purchasing and all-flash array?  What is good is a 7:1 efficiency ratio if you can’t get to the data?  Hitachi Vantara stood out from the crowd by being the first to offer a 100% Data Availability Guarantee with their Total Efficiency Guarantee.


So in closing here’s what I should suggest when evaluating data reduction claims?  Don’t be fooled by the “my number is bigger than your number claims”.  The results that you will see are highly dependent on your data and workload.  Work with the vendor to assess your environment with a sizing tool that provides a realistic expectation of results. Consider that you may not want to run compression and deduplication on certain workloads to maximize performance. You will want the choice to turn data reduction on or off on different volumes within the same array. Also, beware of any vendor that promises you maximum flash performance with the highest data reduction ratios because if it sounds too good to be true it probably is.


For More Info on Hitachi Vantara Investment Protection and Total Data Efficiency:

Hitachi Vantara Up to 4:1 Total Efficiency Guarantee

Hitachi Vantara 100% Data Availability Guarantee

Hitachi Vantara Flash Assurance Program



You can read more great blogs in the Data Center Modernization Series here:

Hu Yoshida's blog - Data Center Modernization - Transforming Data Center Focus from Infrastructure to Information

Nathan Muffin's blog – Data Center Modernization

Mark Adams's blog - Infrastructure Agility for Data Center Modernization

Summer Matheson's blog - Bundles are Better

Paula Phipps' blog - The Super Powers of DevOps to Transform Business

Richard Jew's blog - AI Operations for the Modern Data Center

Enterprises make copies of the critical data sets for assorted reasons, such as: a copy for backup and fast, local recovery; a copy in one or two other locations for business continuity and disaster recovery; copies for the test and development teams; copies for finances and legal; and so on.


If these copies aren’t automated, controlled and secure, they can become costly and a serious liability.


Let’s start with the basics and walk through an example of how Hitachi Vantara, through the use of Hitachi Data Instance Director (HDID) and the range of technologies that it orchestrates, can help organizations automatically create, refresh and expire copy data.


Our main data center is in New York. In it, we have a production application server, let’s say it’s an Oracle database environment. The Oracle data is stored on enterprise-class storage – in this case an Hitachi Virtual Storage Platform(VSP) F-series all-flash array.


Now we need to make a periodic copy of the data for local backup and recovery. The old method of taking an incremental backup each night and a full backup on the weekend doesn’t work anymore. They take too long; often many hours to complete a backup. And they leave too much data at risk; a nightly backup means a recovery point objective (RPO) of 24 hours, which means as much as a full day’s worth of data is at risk of loss. Neither of these are acceptable service levels for critical applications and data.


So instead, we’ll take an hourly application-consistent snapshot using Hitachi Thin Image, which is part of the storage system’s Storage Virtualization Operating System (SVOS). The snapshot can be created as frequently as needed, but once an hour already improves your RPO and reduces the amount of data at risk by more than 95%.


Next, we also have a data center in Boston, so we set up replication to another VSP there to enable business continuity. Since the latency between the sites is low, we can use active-passive synchronous replication (Hitachi TrueCopy), guaranteeing zero data loss. Or, we can support an active-active configuration to enable always-on operations, using the VSP’s global-active device storage clustering feature.


We can also have a 3rd site, let’s say London, connected by asynchronous replication, using Hitachi Universal Replicator, to protect against a major regional outage such as the power blackout that impacted the northeast corner of the United States in 2003. Most areas did not get power restored for more than 2 days, and those businesses that did not have a disaster recovery site outside of the impact zone were severely affected.


Flexible 3 data center topologies are supported, including cascade and multi-target. An additional feature called delta-resync keeps the 3rd site current even when one of the two synchronized sites goes off-line.


Now that our data is completely protected from terrible things happening, we want to create additional copies for other secondary purposes, such as dev/test, finance, long-term backup, etc.


We can create space-efficient virtual copies with our snapshot technology. Or we can create full copy clones using Hitachi ShadowImage. Either way, they are created almost instantaneously with no impact on the product systems. When needed, the copy is mounted to a proxy server and made available to the user.


All of these copy operations may require multiple tools, complex scripting and manual processes. But with Hitachi Data Instance Director, we offer a way to automate and orchestrate all of it, combining these steps into a single policy-based workflow that is very easy to set up and manage.


3DC Architecture.png


We can then take this automation to the next level, by creating service-level based policy profiles. For example, think of gold, silver and bronze services, which are selected based on business needs for the particular application. These profiles can determine the frequency of protection, the tier of storage to use, user access rights, retention, etc.


Everything we’ve talked about can be easily tied into the Hitachi approach to data center modernization. For example, as Hitachi Automation Director (HAD) is provisioning the resources needed to spin up a new application workload, it can automatically provision the correct tier of data protection services at the same time. The communication between HAD and HDID is via a robust RESTful API.


In the near future, Hitachi Infrastructure Analytics Advisor and Hitachi Unified Compute Platform Advisor will be able to monitor HDID and recommend opportunities to improve copy management processes.


To learn more about Hitachi Vantara's approach to modernizing data center operations, check out these blogs:



Rich Vining is a Sr. WW Product Marketing Manager for Data Protection and Governance Solutions at Hitachi Vantara and has been publishing his thoughts on data storage and data management since the mid-1990s. The contents of this blog are his own.

Data center modernization isn’t complete without the right IT Operations Management (ITOM) tools to ensure your data center is running smoothly.  Today’s data center operations are under constant change with new systems, technologies and applications being added, moved and fine-tuned.  Most ITOM tools have a domain specific view into the infrastructure that can be further restricted by vendor-specific approaches.  If you’re looking at a silo view of your data center, it can be difficult to ensure your applications are running at peak performance across all the various infrastructure and devices that are needed to support them.


To address these IT operation challenges, Gartner has been promoting the need for AI Operations, or Artificial Intelligence for IT Operations, where machine learning and big data are used for a new holistic view into IT infrastructure for improved data center monitoring, service management and automation.  Let’s see if Gartner is onto something here.


Gartner: AI Ops Platform*

Gartner AI Ops Image.png


AI Operations starts with gathering large and various data sets; lots of telemetry data from across disparate systems (applications, servers, network, storage, etc.) to be analyzed.  Using machine learning (ML) algorithms, this data is mined to gain new AI insights that can be used to optimize across these various infrastructure systems.  For example, an on-line retailer wants to assess their readiness for Cyber Monday workloads.  If they used domain-specific ITOM tools, they would only get a silo view (i.e. server or storage only) into their IT  operations that would limit their insights.  AI Operations tools benefit from aggregating analysis across multiple data sources providing a broader, complete view into the IT infrastructure that can be used to improve data center monitoring and planning.


In addition to monitoring, AI Operations can impact other IT operation processes such as decreasing the time and effort required to identify and avoid availability or performance problems.  For example, it’s best to be notified a data path between a server and a shared storage port is saturated and then quickly receive a recommended alternative path with plenty of time to move applications that may be overloading the saturated path.  Compare this approach to where an administrator receives separate notices about performance problems on networking and storage ports, then needs to confirm the two issues are related before trying to find an acceptable solution.  AI Operations provides the opportunity to use machine learning to identify interconnected resource trends and dependencies in order to quickly analyze problems compared to manual, silo approaches that are typically based on trial and error.


Hitachi Vantara’s recent announcement to its Agile Data Infrastructure and Intelligent Operations portfolio illustrates how these new machine learning and big data approaches can transform IT operations.  The new releases and integration between Hitachi Infrastructure Analytics Advisor (HIAA) and Hitachi Automation Director (HAD) provide new AI Operations capabilities to establish intelligent operations and the foundation for autonomous management practices:


  • Predictive Analytics – New ML algorithms and custom risk profiles to assess future resource (virtual machine, server or storage) requirements that incorporate all resource interdependencies.  It provides a more complete and accurate resource forecast as it includes performance and capacity as well as all dependent resource requirements on the same data path. This helps to ensure you are upgrading all the right data path resources with the proper configurations when adding a new application workload.
  • Enhanced Root Cause Analysis – New AI, heuristic engine to diagnose problems across the data path faster (4x) with prescriptive analytics recommendations.  By providing suggested resolutions to common problems, the effort and expertise required to troubleshoot performance bottlenecks is greatly reduced while further lower mean-time-to-repair (MTTR) objectives.
  • Cross Product Integration – New integration between HIAA, HAD and Hitachi Data Instance Director (HDID) enable new opportunities for AI-enhanced management practices.  HIAA can now directly execute QoS commands or suggested problem resolutions, i.e. required resource configuration changes, seamlessly with HAD's automated management workflows.  Through its HDID integration, HAD  incorporates new data protection policies, i.e snapshots and clones, into its automated provisioning processes for improved resource orchestration based on both QoS and data resiliency best practices.
  • Improved Management Integration – Enhanced REST APIs provide increased flexibility to integrate HAD into existing management frameworks and practices.  For example, HAD can easily be integrated with IT Service Management (ITSM) ticketing systems, such as ServiceNow, to incorporate the right authentication process or be tied into a broader automated management workflow.


These new updates help to deliver on Hitachi’s AI Operations approach for intelligent operations based on four key data center management steps to deliver enhanced analytics and automation with a continuous feedback loop:


Hitachi's AI Operations Approach for Intelligent Operations

AI Ops Image.png

  • Alert: Utilize ML to continuously monitor across multi-domains (virtual machines, servers, network and storage) and quickly be alerted for performance anomalies while ensuring service levels for business applications.  This helps to filter out unwanted noise and events, so you can keep focused on avoiding problems or issues that might affect your users.
  • Analyze: Leverage algorithms to identify historical trends, patterns or changing application workloads to be better informed on how to optimize resources on the data path or increase utilization of underutilized resources.
  • Recommend: Provide new insights to quickly identify the root cause of problems or analyze evolving requirements to optimally plan for new configurations that may be required for data center expansion.
  • Resolve: Drive action with integrated workflows or orchestration to streamlining adaptive configuation changes or necessary problem fixes. 


These new integrated operational capabilities can help you to better analyze, plan and execute change necessary to optimize IT operations.  This ensures data center systems are running efficiently and at the right cost, which is the real promise for AI Operations.  Whether it’s helping to highlight new trends, identifying problems faster or improving delivery of new resources, AI Operations’ greatest impact is to help IT administrators do their jobs better with the right insights so they can focus on projects that have a strategic  impact to their business.


You can read more great blogs in this series here:

Hu Yoshida's blog - Data Center Modernization - Transforming Data Center Focus from Infrastructure to Information

Nathan Muffin's blog – Data Center Modernization

Mark Adams's blog - Infrastructure Agility for Data Center Modernization

Summer Matheson's blog - Bundles are Better

Paula Phipps' blog - The Super Powers of DevOps to Transform Business


Storage Systems


Richard Jew's Blog



AIOps Platform Enabling Continuous Insights Across IT Operations Management (ITOM)

Market Guide for AIOps Platforms - Gartner, August 2017

Looking At, And Beyond a Storage / Server Mindset


For those that want to skip ahead: Data center modernization – creating the next generation data center – requires you to consider systems, protection and operations. Many vendors think all you need is new systems - WRONG! As we’ll discuss that mindset will end up raising costs and can inhibit innovation. At the end are links to dig deeper on this topic. Check out the press release too!


ross.jpgEver seen Joy of Painting? If not, check it out. Bob Ross was a great painter and a calming force, which is impressive given his history as an US Air Force drill sergeant. A recurring component of Ross shows and paintings is ‘happy little trees.’ In fact, happy trees show up so often that there are memes, shirts and more for the phrase.


Interestingly, there’s a strong correlation between happy trees and vendors simplifying data center modernization as nothing more than a refresh of systems – storage, server, networking. That may appeal to the IT junkies in us, but thinking modernization only equals new systems is counterproductive to success.


Why? Because data center modernization encompasses a lot more than systems. People may love Bob Ross’ happy trees, but without the rest of the landscape it isn’t a picture. It’s incomplete. Similarly, if you don’t look beyond systems when modernizing, your costs may go UP instead of down and innovation may slow down.


Why Modernize?


To understand why a system-only mindset can hurt you, it helps to consider why IT leaders modernize to create next generation data centers. Priorities vary, but the reasons I continually here are:


  • Increase operational efficiency and reduce capital expenditures
  • Accelerate time to market for new projects and programs
  • Improve customer experiences
  • Minimize risk to the business


Said another way, data center modernization is about supporting and accelerating innovation. Refreshing systems to get new functionality and meet evolving SLAs is absolutely a part of this, but it only gives you modern systems – not a modern, next generation, data center. So, if your vendor only focuses on systems… watch out!


Why Systems Alone Don’t Modernize a Data Center


car.pngIf I put a jet engine on a 1970s Chevy, do I have a modern car? That’s pretty clearly a big ‘no.’ I may have a new engine, but everything that surrounds it is not optimized for the jet engine. The stability of the vehicle, the quality of the driver experience and more need to be modernized to support that engine! And since the owner is likely to spend more time fixing things than driving, where’s the ROI!?!?!?


Here’s one that hits closer to home (thanks Gary Breder). If you have a single camera checking the entrance to a data center, you can have one person monitoring it. But what if you add 4 or even 100 cameras to cover the inside and outside of the facility? Can one person watch the all those feeds? Not well. You could linearly add staff as camera counts increase, but that adds cost and eats up your staff’s time, keeping them from more strategic work! Instead, you need to rethink – modernize – management.


The same is true with data centers. You must modernize processes – and protection – to scale with the changes in your environment. Otherwise you have inefficiencies that cost you time, money and agility.


A More Complete Approach to Modernization


There are several ways to ‘slice’ a broader view of data center modernization, but I like to keep things simple, so let’s break things down into 3 categories to start:


  1. Agile Data Infrastructure
  2. Modern Data Protection
  3. Intelligent (AI) Operations


These categories allow us to cover areas where we are mostly likely to enhance systems, software and operational processes for increased operational efficiency, faster time to market, improved customer experiences and accelerated adoption of new business models. We can define each area as follows:


Agile Data Infrastructure: The combination of systems needed to consistently deliver data at high speed for the best possible experience. These systems should be resilient, support a broad diverse range of use cases and enable the business to quickly without being constrained.

    • Buying Consideration: If you have more than one application in your data center, odds are you’ll need a few different types of systems, so look for a vendor that can offer a range of choices to meet your needs..


Modern Data Protection: Software and processes that ensures continuous availability of data – independent of external influences – in support of the customer experience. Modern protection also supports adherence to compliance requirements, new regulations and data security.

    • Buying Consideration: With new data privacy guidelines and concerns about security, data protection is becoming even more complex. Look for a partner that has a solid consulting team and knows how to integrate their offering into your existing framework.


Intelligent (AI) Operations: Integrated software that leverages AI and machine learning to analyze, predict, prescribe and execute changes to achieve data center SLAs. This software ensures systems are continually optimized for peak performance, stability and cost efficiency. This frees data center staff to focus on strategic initiatives / implementing new technologies, accelerating innovation.

    • Buying consideration: This is an emerging area that will change a lot over the next few years. Be sure to look at vendors with an API integration focus. This will let them integrate their products with other vendor offerings to create a ‘collaborative’ AI or a ‘hive mind’ for deeper insights, more robust automation. Check out our AI Operations portfolio including Hitachi Infrastructure Analytics Advisor and Automation Director.


modern.pngIf I go back to the car analogy, a next generation car will certainly have a engine (system), but it will also have a new user interface and challenge our thoughts on driving (AI operations) as well as the rules / regulations of the roads (protection). Kind of like this image of how the inside of a self-driving car might change in the future.


Or as Bob Ross might say, happy little trees are wonderful, but without the sky, clouds, and other things, it really isn’t a complete picture.


Hey, That’s It?


Hold on! We didn’t describe each of the areas! I know. That happens in other blogs coming!


In the mean time check out the press release for our new VSP systems and AI Operations software. Also check out this video series we did on data management and creating an integrated, AI operation portfolio.


Ask ten people for their thoughts on Artificial Intelligence and you will get answers that span the emotional range from “Alexa is great!” to “HAL 9000: I’m sorry Dave, I’m afraid I can’t do that”.


Personally, I believe that we need to embrace this nascent technology and trust that we will never need to meet HAL 9000, good intentions or not.


So how does Intelligent Automation impact YOU, a knowledge worker in high tech?  Especially if you’re a highly valued and  highly stressed member of an IT team, responsible for responding quickly and often to business & client needs, while at the same time ensuring that you’re “keeping the lights on” with zero impact to users’ ability to access business applications.


You’ve read in my previous blogs how Hitachi Vantara’s Hitachi Automation Director software can help accelerate resource development and reduce manual steps by >70%.


THIS IS PART II of the blog - how to get an ALEXA SKILL up and running.


PART III of the blog will be posted later :  Hitachi Automation Director’s capability to be integrated w ALEXA SKILL


Today, let’s take it a step further by discussing what you can do with Hitachi Automation Director’s flexible REST API with necessary context via JSON payload. Specifically how HAD’s infrastructure service catalog can be presented as menu items for upper lay CMP or a voice-oriented CMP (Cloud Management Platform) via Alexa Skill. Alexa demo is a technology preview that showcases how HAD can integrate with northbound cloud management layer.


That’s correct – use ALEXA in conjunction with Hitachi Automation Director to provision your storage, among other cool things – whoa!!!




FULL DISCLOSURE:  This is a technology preview demo to showcase Hitachi Automation Director capabilities, as a proof of concept. We have created ALEXA and Hitachi Automation Director demo and have shown to customers what can be done. Today, this is not a formally released or supported feature. For actual production use, you need to consider factors such as cloud integration, security, and contractual obligation with cloud vendor like Amazon Alexa or Google Home, etc.


Are you ready to get started?  Some minor (or major coding) to follow, depending on you background. 

The following is credited to Craig Chan, Hitachi Vantara’s Director of SW Strategy and resident man of a thousand hats and former developer, in his own words.


It’s always easier to leverage someone else’s work and that’s just what I did here.  I knew I wanted to use Python because we have sample Python scripts for our products that are easy to use.  So, a quick search on the internet turned up a nice Alexa Python Tutorial – Build a voice experience in 5 minutes or less.  What a great way to get started!  Setting up this dialog was a snap and it did only take 5 minutes.  I spent more time setting up my Amazon developer account.  Remember last time I took you through the dialog with Alexa to manage my storage?  Well here is where it all began.  Let’s look at the Color Picker dialog:


“Alexa, open Color Picker”


“Welcome to the Alexa skills kit sample. Please tell me your favorite color by saying, my favorite color is red.”


“My favorite color is blue.”



“I now know your favorite color is blue. You can ask me your favorite color by saying, what’s my favorite color?”



“What’s my favorite color?”



“Your favorite color is blue. Goodbye.”


Simple, but effective.  I created my first Alexa Skill and it didn’t take much effort!  Now how does that all work?  Well the trick here is all the code is already written for you.  By selecting the “blueprint” you were basically taking Amazon’s code and using it as a skill.  We have to read through that code to understand how it works, but instead of pasting 207 lines of code here, go ahead to Amazon’s site and check it out.

Now I want to take that dialog and make it storage specific.  So instead of the above dialog, the new dialog looks like this:



“Alexa, open my storage”



“Your storage is running properly. How can I help you manage your storage today? You can say, create a volume of 100 gigabytes.”



“Create a volume of 20 gigabytes.”



“Okay. I can create a volume of 20 gigabytes. I can also format your volume for you. Just say, please format my volume.”



“Please format my volume.”



“I am formatting your volume of 20 gigabytes. Formatting volume complete. Thank you for using Alexa to manage your storage.”


Great! Alexa just created a formatted a volume of 20 GB!  Well, not exactly.  You had that dialog with Alexa, but it didn’t really do anything.  Having the dialog is pretty cool though and it did hear what capacity you asked for and listened to your request to format it.  What happened here is I took the “myColorPicker” function and just modified the text.  I also wanted to know what variables were being saved so I changed those as well.  Now instead of saving my favorite color, it was saving my capacity choice.  Take a look at the code I attached here. It’s in Python so it’s pretty easy to read through.


As you read through the code you might have noticed something called an “intent”, or if you were paying real close attention, you might have noticed something else called a “slot”.  Intents are defined in the Amazon developer portal where you develop the actual skill that uses the code you put into Lambda.  The Color Picker Skill uses “MyColorIsIntent” and “WhatsMyColorIntent”.  The slot is the “LIST_OF_COLORS” or choices that you have for colors (I added purple to mine).  For my new skill, let’s call it VSPG Storage Creator, I changed the intents to “MyCapacityIsIntent” and “FormatVolumeIntent”.  Then I changed the slot to “LIST_OF_CAPACITIES”.  Now I didn’t want to go wild with capacities so only capacities of 10-100 in increments of 10 were allowed.  And one last thing, some sample utterances.  These are the phrases you are expecting the person talking to Alexa to say. Depending on how flexible you want Alexa to be, you can change this to whatever you want, but for simplicity, I just modified the Color Picker ones to “MyCapacityIsIntent Create a volume of {Capacity} gigabytes” and “FormatVolumeIntent please format my volume”.


Okay, that was a lot to read, and probably confusing unless brought into context.  Let’s follow the instructions below to first setup Lambda:





Code?! Yes code!  But this code is pretty easy, even if it’s really long.  So to make it easier on you, just copy and paste the below code to replace in the area.


This is a demo VSP-G Storage skill built with the Amazon Alexa Skills Kit.




from __future__ import print_function



# --------------- Helpers that build all of the responses ----------------------


def build_speechlet_response(title, output, reprompt_text, should_end_session):

    return {

       'outputSpeech': {

            'type': 'PlainText',

            'text': output


        'card': {

            'type': 'Simple',

            'title': "SessionSpeechlet - " + title,

            'content': "SessionSpeechlet - " + output


        'reprompt': {

            'outputSpeech': {

                'type': 'PlainText',

                'text': reprompt_text



        'shouldEndSession': should_end_session




def build_response(session_attributes, speechlet_response):

    return {

        'version': '1.0',

        'sessionAttributes': session_attributes,

        'response': speechlet_response




# --------------- Functions that control the skill's behavior ------------------


def get_welcome_response():

    """ If we wanted to initialize the session to have some attributes we could

    add those here



    session_attributes = {}

    card_title = "Welcome"

    speech_output = "Your storage is running properly. " \

                    "How can I help you manage your storage today? " \

                    "You can say, create a volume of 100 gigabytes."

    # If the user either does not reply to the welcome message or says something

    # that is not understood, they will be prompted again with this text.

    reprompt_text = "Sorry, I didn't catch that. " \

                    "How can I help you manage your storage today? " \

                    "You can say, create a volume of 100 gigabytes."

    should_end_session = False

    return build_response(session_attributes, build_speechlet_response(

        card_title, speech_output, reprompt_text, should_end_session))



def handle_session_end_request():

    card_title = "Session Ended"

    speech_output = "Thank you for managing your storage with Alexa. " \

                    "Have a nice day! "

    # Setting this to true ends the session and exits the skill.

    should_end_session = True

    return build_response({}, build_speechlet_response(

        card_title, speech_output, None, should_end_session))



def create_desired_capacity_attributes(desired_capacity):

    return {"desiredCapacity": desired_capacity}



def set_capacity_in_session(intent, session):

    """ Sets the capacity in the session and prepares the speech to reply to the




    card_title = intent['name']

    session_attributes = {}

    should_end_session = False


    if 'Capacity' in intent['slots']:

        desired_capacity = intent['slots']['Capacity']['value']

        session_attributes = create_desired_capacity_attributes(desired_capacity)

        speech_output = "Okay. I can create a volume of " + \

                        desired_capacity + " gigabytes"\

                        ". I can also format your volume for you. " \

                        "Just say, please format my volume."

        reprompt_text = "I can also format your volume for you. " \

                        "Just say, please format my volume."


        speech_output = "I don't have that capacity available. " \

                        "Please try again."

        reprompt_text = "I don't have that capacity available. " \

                        "Please tell me a capacity number I can use."

    return build_response(session_attributes, build_speechlet_response(

        card_title, speech_output, reprompt_text, should_end_session))



def format_volume_from_session(intent, session):

    session_attributes = {}

    reprompt_text = None


    if session.get('attributes', {}) and "desiredCapacity" in session.get('attributes', {}):

        desired_capacity = session['attributes']['desiredCapacity']

        speech_output = "I am formating your volume of " + desired_capacity + " gigabytes"\

                        ". Formating volume complete. Thank you for using Alexa to manage your storage."

        should_end_session = True


        speech_output = "I don't have any capacity to format. " \

                        "You can say, create a volume of 100 gigabytes."

        should_end_session = False


    # Setting reprompt_text to None signifies that we do not want to reprompt

    # the user. If the user does not respond or says something that is not

    # understood, the session will end.

    return build_response(session_attributes, build_speechlet_response(

        intent['name'], speech_output, reprompt_text, should_end_session))



# --------------- Events ------------------


def on_session_started(session_started_request, session):

    """ Called when the session starts """


    print("on_session_started requestId=" + session_started_request['requestId']

          + ", sessionId=" + session['sessionId'])



def on_launch(launch_request, session):

    """ Called when the user launches the skill without specifying what they




    print("on_launch requestId=" + launch_request['requestId'] +

          ", sessionId=" + session['sessionId'])

    # Dispatch to your skill's launch

    return get_welcome_response()



def on_intent(intent_request, session):

    """ Called when the user specifies an intent for this skill """


    print("on_intent requestId=" + intent_request['requestId'] +

          ", sessionId=" + session['sessionId'])


    intent = intent_request['intent']

    intent_name = intent_request['intent']['name']


    # Dispatch to your skill's intent handlers

    if intent_name == "MyCapacityIsIntent":

        return set_capacity_in_session(intent, session)

    elif intent_name == "FormatVolumeIntent":

        return format_volume_from_session(intent, session)

    elif intent_name == "AMAZON.HelpIntent":

        return get_welcome_response()

    elif intent_name == "AMAZON.CancelIntent" or intent_name == "AMAZON.StopIntent":

        return handle_session_end_request()


        raise ValueError("Invalid intent")



def on_session_ended(session_ended_request, session):

    """ Called when the user ends the session.


    Is not called when the skill returns should_end_session=true


    print("on_session_ended requestId=" + session_ended_request['requestId'] +

          ", sessionId=" + session['sessionId'])

    # add cleanup logic here



# --------------- Main handler ------------------


def lambda_handler(event, context):

    """ Route the incoming request based on type (LaunchRequest, IntentRequest,

    etc.) The JSON body of the request is provided in the event parameter.


    print("event.session.application.applicationId=" +




    Uncomment this if statement and populate with your skill's application ID to

    prevent someone else from configuring a skill that sends requests to this



    # if (event['session']['application']['applicationId'] !=

    #         "[unique-value-here]"):

    #     raise ValueError("Invalid Application ID")


    if event['session']['new']:

        on_session_started({'requestId': event['request']['requestId']},



    if event['request']['type'] == "LaunchRequest":

        return on_launch(event['request'], event['session'])

    elif event['request']['type'] == "IntentRequest":

        return on_intent(event['request'], event['session'])

    elif event['request']['type'] == "SessionEndedRequest":

        return on_session_ended(event['request'], event['session'])


You’ve just coded your very own Alexa skill! As you put that python script into Lambda, you might have noticed that we created our own names for the intents.  This leads us into configuring the skill to work with our intents.  Intents are things you want to happen.  For us, it’s about creating a volume and formatting that volume.  For these intents, we need to define a set of valid values (capacity amounts) and utterances (phrases that Alexa will understand).  Let’s configure our skill.



And we are done! Go ahead and test your new Alexa Skill and see how you can interact with Alexa.  Try different utterances and even different dialog in the code so Alexa says different things back to you.  Also give your own invocation name so it becomes your very own unique skill. 


Stay tuned for Part III of the blog, same time same channel!



Per IDC’s Copy Data Management Challenge, “65% of storage system capacity is used to store non-primary, inactive data”. In fact, Inactive data residing on flash is the single biggest threat to the ROI of data center modernization initiatives. As my friends and I get older we often find ourselves talking about “right-sizing”, or moving to a smaller, and less expensive piece real estate. In many cases life has evolved or perhaps needs have changed and we simply don’t want or need the cost of maintaining a larger residence.  The benefits are obvious when it comes to savings in mortgage, utilities, and general upkeep.  Well your data has a lifecycle as well, it starts being very active with high IO and the need for a high quality of service and low latency.  However, as data ages it’s just not economical or necessary for it to reside in the “high rent” district which in most cases is an active flash tier.




Now data tiering through the lifecycle is not a new concept, but the options available to you have never been greater.   Destinations such as lower cost/performance tiers of spinning disk can be an option.  If your organization has deployed a private cloud, that might be an excellent destination to tier inactive data.  For those who have adopted public cloud IaaS, that certainly is a low-cost destination as well.  Let’s explore some of these options and solutions for managing the data lifecycle through the different options available.  More importantly, let us look at some issues that should be considered before creating a data lifecycle residency plan with the goal of maximizing your current investments in on premise all flash arrays, and both private and public clouds.


Automated Storage Tiering

Deploying automated storage tiering is a good place to get started as the concept is familiar to most storage managers.  For example, Hitachi Dynamic Tiering is software which allows you to create 3-tier pools within the array, and it allows you to enact policies which will automatically move data to a specified type of media pool once the pre-defined criteria has been met.



In a modern hybrid flash array like the Hitachi Vantara VSP G series, your pools can be defined based upon the type of storage media in the array.  This is especially economic in the VSP G series because the array can be configured with SSDs, Hitachi Flash Modules (FMD), or hard disk drives.  Essentially, high IO data starts residency on high performance SSD or FMD, and then dynamic tiering automatically moves it to low cost hard disk drives as it ages and becomes less active.  The savings in storage real estate in terms of cost per GB can be well over 60%.  But wait there’s more benefits to be had.


Integrated Cloud Tiering – Private Clouds and Object Storage

It’s no secret that migrating inactive data to the cloud can lead to a storage saving well over 70%.  The benefit doesn’t stop there as a well-managed data lifecycle frees up top tier flash for higher priority active data. Many top financial institutions choose to tier inactive based data off flash tiers and onto lower cost private cloud object storage.  In this way, they get the savings of moving this data into a low-cost tier and the there are no questions of data security and control behind the corporate firewall.  In addition, if the data ever needs to be moved back to and active tier it can be done quickly and inexpensively without the data egress fees incurred by public cloud providers.  In addition, Private cloud object storage like the Hitachi Content Platform (HCP), give your enterprise a “rent-controlled” residence with all the benefits of a public cloud and without concerns of security because you are in control of your data.


Cloud Tiering – Public Clouds

Public clouds like Amazon and Azure have changed the data residency landscape forever.  They are an excellent “low-cost” neighborhood for inactive data.  Companies of all sizes, from small to the largest enterprise, leverage the low cost and ‘limitless’ storage of public clouds as a target for inactive and archive data.


Potential Issues - Tiering Data to the Cloud

The concept of tiering to either public or private clouds is simple but executing a solution may not be as straightforward.  Many storage vendors claim the ability to tier to the cloud, but when you look at their solution, you’ll often find that they are not transparent to applications and end-users, often requiring significant rework of your IT architecture and a lower / delayed ROI in data center modernization. These solutions add complexity to your environment and often add siloed management requirements. The bottom line is that very few vendors understand and offer an integrated solution of high performance flash, automated tiering, and a choice of migration to multiple cloud types. Regarding public clouds, it should be noted that a downside is that if the data is not quite inactive, let’s say “warm,” it can be very costly to pull it back from the public cloud due to the previously mentioned data egress fees.  Not to mention it can take a very long time based on the type of service level agreement. For this reason, many tenants choose to only migrate backups and cold archive data to public clouds.


Hitachi Vantara Cloud-Connected Flash – 1 Recipe integrates all the right Ingredients

Cloud-Connected Flash is a solution from Hitachi Vantara which delivers a complete solution for file data lifecycle residency.  Hitachi is a leading vendor in hybrid flash arrays, all-flash arrays, object storage and cloud.



The solution is an easy concept to understand, “Data should reside on its most economic space”.  As illustrated in the graphic above. Active NAS data is created and maintained in a Hitachi Vantara VSP G or F series unified array.  Within the array, data is dynamically tiered based on the “migration policy” between pools of SSD, FMD (Hitachi Flash Modules) and disk drives.  As the file data ages, Hitachi data migration to cloud software moves the data, based on policy to your choice of clouds (Public Amazon, Azure, IBM Cloud or private HCP, Hitachi Content Platform).  When migrating data to HCP, the data can also be pulled back into the top flash tiers if needed creating a highly flexible and dynamic data residency environment.  But the value doesn’t stop at the savings in terms of cost of storage and maintenance. The Hitachi cloud connected flash solution can also include an analytics component to glean insights from your data lake which is comprised of both on premise and cloud data.


Cloud to Data Insights with Pentaho Analytics

Pentaho Analytics enables the analysis of your data both on premise and in public and private clouds. As an expert in IoT and analytics, Hitachi Vantara offers the customizable Pentaho platform to analyze data and create dash boards and reports in your cloud connected flash environment. The goal being to achieve better business outcomes by leveraging your Hitachi Vantara cloud-connected flash solution. Pentaho offers data integration, a highly flexible platform for blending, orchestrating, and analyzing data from virtually any source, effectively reaching across system, application and organizational boundaries.

  • Run Pentaho on a variety of public or private cloud providers, including Amazon Web Services (AWS) and Microsoft Azure
  • Leverage scale-out architecture to spread data integration and analytics workloads across many cloud servers
  • Connect on premise systems with data in the cloud and blend a variety of cloud-based data sources together
  • Seamlessly integrate and analyze leading Big Data sources in the Cloud, including Amazon Redshift and hosted Hadoop distributions
  • Access and transform data from web services and enterprise SaaS applications



Learn More, Hitachi Vantara Can Help

Hitachi Vantara Cloud-Connected Flash solutions are highly flexible, and individual components can be deployed based on business needs.  This enables customer to start with dynamic array tiering, then add cloud tiering and ultimately analytics. Please contact your Hitachi Representative or Hitachi Partner to learn how your organization can benefit from cloud-connected flash.


NVMe (non-volatile memory express) is a standard designed to fully leverage the performance benefits of non-volatile memory in all types of computing and data storage.  NVMe’s key benefits include direct access to the CPU, lightweight commands, and highly scalable parallelism which all lead to lower application latency and insanely fast IO. There has been a lot of hype surrounding NVMe in the press and how it can put your IT performance into the equivalent of Tesla’s “Ludicrous mode”, but I would like to discuss some “real world” considerations where NVMe can offer great benefits and perhaps shine a caution light in areas that you might not have considered.  As NVMe is in its infancy as far as production deployment, its acceptance and adoption are being driven by a need for speed.  In fact Hitachi Vantara recently introduced NVMe caching in its hyperconverged Unified Compute Platform (UCP).  This is a first step to mobilizing the advantages of NVMe to accelerate workloads by using NVME in the caching layer.


Parallel vs Serial IO execution - The NVMe Super Highway

What about storage? So where are the bottlenecks and why can’t SAS and SATA keep up with today’s high performance flash media?  The answer is that both SAS and SATA were designed for rotating media and long before flash was developed, consequently these command sets have become the traffic jam on the IO highway. NVMe is a standard based on peripheral component interconnect express (PCIe) and its built to take advantage of today’s massively parallel architectures.  Think of NVMe as a Tesla Model S capable of achieving 155mph in 29 seconds, stifled by old infrastructure (SAS/SATA) and a 55mph speed limit. All that capability is wasted. So what is driving the need for this type of high performance in IT modernization?  For one, Software-Defined Storage (SDS) is a rapidly growing technology that allows for the policy–based provisioning and management of data storage independent of the underlying physical storage. As datacenter modernization is at the core of IT planning these days, new technologies such as Software-Defined Storage are offering tremendous benefits in data consolidation and agility. As far as ROI and economic benefits, SDS’s ability to be hardware agnostic, scale seamlessly, and deliver simplified management is a total victory for IT. So then what is the Achilles heel for SDS and its promise to consume all traditional and modern workloads?  Quite frankly, SDS has been limited by the performance constraints of traditional architectures. Consequently, many SDS deployments are limited to applications that can tolerate the latency caused by the aforementioned bottlenecks.


Traditional Workload: High-Performance OLTP and Database Applications

Traditional OLTP and database workloads are the heartbeat of the enterprise. I have witnessed instances of customers having SDS deployments fail because of latency between storage and the application, even when flash media was used.  Surely the SDS platform, network, and compute were blazing fast, but the weak link was the SAS storage interface.  Another problem is that any type of virtualization or abstraction layer used to host the SDS instance on a server is going to consume more performance than running that service on bare metal. In an SDS environment, highly transactional applications will require the additional IOPS to keep latency from the virtualization layer in check and deliver the best quality of service to the field.  At the underlying storage level, traditional SAS and SATA constrain flash performance. The bottom line is that NVMe inherently provides much greater bandwidth than traditional SAS or SATA.  In addition, NVMe at the media level can handle 64,000 queues compared to SAS (254 queues) and SATA (32 queues).   This type of performance and parallelism can enable high-performance OLTP and deliver optimized performance with the latest flash media.  So the prospect is that more traditional high-performance OLTP workloads can be migrated to an SDS environment enabled by NVMe.



Caveat Emptor – The Data Services Tax
The new world of rack scale flash, SDS, and hyper converged infrastructure offerings promise loftier performance levels, but there are speed bumps to be considered.  This is especially true when considering the migration of mission-critical OLTP applications to a software-defined environment.  The fact of the matter is that data services (compression, encryption, RAID etc.) and data protection (snaps, replication, and deduplication) reduce IOPS. So be cautious when considering a vendor’s IOPS specification because in most cases the numbers are for unprotected and un-reduced data. In fact, data services can impact IOPS and response times to the extent that AFA’s with NVMe will not perform much better than SCSI-based AFA’s The good news is that NVMe performance and parallelism should provide plenty of horsepower (IOPS) to enable you to move high performance workloads into an SDS environment. The bad news is that you will need your hardware architecture to be more powerful and correctly designed to perform data services and protection faster than ever before (e.g. more IO received per second = more deduplication processes that must occur every second). Note that you also need to consider whether or not your SDS application, compute, network and storage are designed to take full advantage of NVMe’s parallelism.  Also note that a solution is only as fast as its weakest link and for practical purposes it could be your traditional network infrastructure. If you opt for NVMe on the back-end (between storage controllers and media) but do not consider how to implement NVMe on the front-end (between storage and host / application), you may just be pushing your performance bottleneck to another point of IO contention and you won't get any significant improvement.


Modern Workload: Analytics at the Edge

It seems as though “Analytics” has replaced “Cloud” as the IT modernization initiative de jour. This is not just hype as the ability to leverage data to understand customers and processes is leading to profitable business outcomes never before possible. I remember just a few years ago the hype surrounding Hadoop and batch analytics in the core data-center and in the cloud.  It was only a matter of time before we decided that best place to produce timely results and actions from analytics are at the edge. The ability to deploy powerful compute in small packages makes analytics at the edge (close to where the data is collected) a reality.  The fundamental benefit is the network latency being saved by having the compute function at the edge.  A few years ago analytics architecture data would travel via a network or telemetry-to-network and then to the cloud. That data would be analyzed and the outcome delivered back the same way it arrived.  So edge analytics cuts out data traversing the network and saves a significant chunk of time.  This is the key to enabling time sensitive decisions like an autonomous vehicle avoiding a collision in near real-time.  Using NVMe /PCIe, data can be sent directly to a processor at the edge to deliver the fastest possible outcomes.  NVMe enables processing latency to be reduced to microseconds and possibly nanoseconds.  This might make you a little more comfortable about taking your hands off the wheel and letting the autonomous car do the driving…


The Take Away

My advice to IT consumers is to approach any new technology with an open mind and a teaspoon of doubt. Don’t get caught up in hype and specs. Move at a modernization pace that is comfortable and within the timelines of your organization.  Your business outcomes should map to solutions and not the other way around. “When in doubt, proof it out”, make sure your modernization vendor is truly a partner.  They should be willing to demonstrate a working proof of concept, especially when it comes to mission-critical application support.  Enjoy the new technology; it’s a great time to be in IT!


More on NVMe

NVMe 101: What’s Coming To The World of Flash

How NVMe is Changing Networking Part 1

How NVMe is Changing Networking Part 2

Redesigning for NVMe: Is Synchronous Replication Dead?

NVMe and Data Protection: Time To Put on Those Thinking Caps

The Brocade view on how NVMe affects network design

NVMe and Me – How To Get There

An In Depth Conversation with Cisco.


Welcome back! (Yes, I was waiting for this opportunity to show my age with a Welcome Back Kotter image).


For those of you that haven’t been following along in real time – c’mon man! – we’re in the midst of a multi-blog series around NVMe and data center design (list of blogs below).


That’s right, data center design. Because NVMe affects more than just storage design. It influences every aspect of how you design an application data path. At least it should if you want maximum return on NVMe investments.


In our last blog we got the Brocade view on how NVMe affects network design. As you might imagine, that conversation was very Fibre Channel-centric. Today we’re looking at the same concept – network design – but bringing in a powerhouse from Cisco.


If you've been in the industry for a while you've probably heard of him: J Michel Metz. Dr. Metz is an R&D Engineer for Advanced Storage at Cisco and sits on the Board of Directors for SNIA, FCIA and NVM Express. So… yeah. He knows a little something about the industry. In fact, check out a blog on his site called storage forces for some background on our discussion today. And if you think you know what he’s going to say, think again.


Ok. Let’s dig in.


Nathan: Does NVMe have a big impact on data center network design?


J: Absolutely. In fact I could argue the networking guys have some of the heaviest intellectual lift when it comes to NVMe. With hard disks, tweaking the network design wasn't nearly as critical. Storage arrays – as fast as they are - were sufficiently slow so you just sent data across the wire and things were fine.


Flash changed things, reducing latency and making network design more critical, but NVMe takes it to another level. As we’re able to put more storage bits on the wire, it increases the importance of network design. You need to treat it almost like weather forecasting; monitoring and adjusting as patterns change. You can’t just treat the storage as “data on a stick;” just some repository of data at the end of the wire, where you only have to worry about accessing it.


Nathan: So how does that influence the way companies design networks and implement storage?


J: To explain I need to start with a discussion of how NVMe communications work. This may sound like a bizarre metaphor, but bear with me.


Think of it like how food is ordered in a ‘50s diner. A waitress takes an order, puts the order ticket on the kitchen counter and rings a bell. The cook grabs the ticket, cooks the food, puts the order back on the counter and rings the bell. The waitress then grabs the order and takes it to the customer. It’s a process that is efficient and allows for parallel work queues (multiple wait staff and cooks).


Now imagine the customers, in this case our applications, are a mile away from the kitchen, our storage. You can absolutely have the waitress or the cook cross that distance, but it isn't very efficient. You can reduce the time to cross the distance by using a pneumatic tube pass orders to the kitchen, but someone ultimately has to walk the food. That adds delays. Again, the same is true with NVMe. You can optimize NVMe to be transferred over a network, but you’re still dealing with the physics of moving across the network.


At this stage you might stop and say ‘hey, at least our process is a lot more efficient and allows for parallelism.’ That could leave you with a solid NVMe over Fabric design. But for maximum speed what you really want is to co-locate the customers and kitchen. You want your hosts as close to the storage as possible. It’s the trade-offs that matter at that point. Sometimes you want the customers in the kitchen. And that’s what hyper-convergence is, but obviously can only grow so large. Sometimes you want a centralized kitchen and many dining rooms. That’s also what you can achieve with rack-scale solutions that put an NVMe capacity layer sufficiently close to the applications, at the ‘top of rack.’ And so on.


Nathan: It sounds like you’re advocating a move away from traditional storage array architectures.


J: I want to be careful because this isn’t an ‘or’ discussion, it’s an ‘and’ discussion. HCIS is solving a management problem. It’s for customers that want a compute solution with a pretty interface and freedom from storage administration. HCIS may not have nearly the same scalability as an external array, but it does allow application administrators to easily and quickly spin up VMs.


As we know though, there are customers that need scale. Scale in capacity; scale in performance and scale in the number of workloads they need to host. For these customers, HCIS isn’t going to fit the bill. Customers that need scale – scale across any vector – will want to make a trade-off in management simplicity for the enterprise growth that you get from an external array.


This also applies to networking protocols. The reason why we choose protocols like iWARP is for simplicity and addressability. You choose the address and then let the network determine the best way to get data from point A to point B. But, there is a performance trade-off.


Nathan: That’s an excellent point. At no point have we ever seen IT coalesce into a single architecture or protocol. If a customer needs storage scale with a high-speed network what would you recommend?


           J: Haven’t you heard that every storage question is answered with, “It depends?”


Joking aside, it’s never as simple as figuring out the best connectivity options. All storage networks can be examined “horizontally.” That is the phrase I use to describe the connectivity and topology designs from a host through a network to the storage device. Any storage network can be described this way, so that it’s easy to throw metrics and hero numbers at the problem: what are the IOPS, what is the latency, what are the maximum number of nodes, etc.


What we miss in the question, however, is whether or not there is a mismatch between the overall storage needs (e.g., general purpose network, dedicated storage network, ultra-high performance, massive scale, ultra-low latency, etc.) and the “sweet spot” of what a storage system can provide.


There is a reason why Fibre Channel is the gold standard for dedicated storage networks. Not only is it a well-understood technology, it’s very, very good and not just performance, but reliability. But for some people there are other considerations to pay attention to. Perhaps the workloads don’t need to lend themselves to a dedicated storage network. Perhaps “good enough” is, well, “good enough.” For them, they are perfectly fine with really great performance with Ethernet to the top-of-rack, and don’t need the kind of high availability and resiliency that a Fibre Channel network, for instance, is designed to provide.


Still others are looking more for accessibility and management, and for them the administrative user interface is the most important. They can deal with performance hits because the management is more important. They only have a limited number of virtual machines, perhaps, so HCIS using high-speed Ethernet interconnects is perfect.

As a general rule, “all things being equal” are never actually equal. There’s no shortcut for good storage network design.


Nathan: Let’s look forward now. How does NVMe affect long term network and data center design?


J: <Pause> Ok, for this one I’m going to be very pointedly giving my own personal opinion. I think that the aspect of time is something we’ve been able to ignore for quite a while because storage was slow. With NVMe and flash though, time IS a factor and it is forcing us to reconsider overall storage design, which ultimately affects network design.


Here is what I mean. Every IO is processed by a CPU. The CPU receives a request – write, etc. –passes it on and then goes off to do something else. That process was fine when IO was sufficiently slow. CPUs could go off and do any number of additional tasks. But now, it’s possible for IO to happen so fast that the CPU cannot switch between tasks before the IO response is received. The end result is that a CPU can be completely saturated by a few NVMe drives.


Now, this is a worst-case scenario, and should be taken with a grain of salt. Obviously, there are more processes going on that affect IO as well as CPU utilization. But the basic premise is that we now have technologies that are emerging that threaten to overwhelm both the CPU and the network. The caveat here, the key take-away, is that we cannot simply swap out traditional spinning disk, or flash drives, with NVMe and expect all boats to rise.

In my mind this results in needing more intelligence in the storage layer. Storage systems, either external arrays or hyperconverged infrastructures, will ultimately be able to say no to requests and ask other storage systems for help. They’ll work together to coordinate and decide who handles tasks like an organic being.


Yes, some of this happens as a result of general machine learning advancements, but it will be accelerated because of technologies like NVMe that force us to rethink our notion of time. This may take a number of years to happen, but it will happen.


Nathan: If storage moves down this path, what happens to the network?


J: Well, you still have a network connecting storage and compute but it, too, is more intelligent. The network understands what its primary objectives are and how to prioritize traffic. It also knows how to negotiate with storage and the application to determine the best path for moving data back and forth. In effect, they can act as equal peers to decide on the best route.


You can also see a future where storage might communicate to the network details about what it can and can’t do at any given time. The network could then use this information to determine the best possible storage device to leverage based on SLA considerations. To be fair, this model puts the network in a ‘service broker’ position that some vendors may not be comfortable with. But since the network is a common factor that brings storage and servers together it creates opportunity for us to establish the best end-to-end route.


In a lot of ways, I see end-to-end systems coming together in a similar fashion to what was outlined in Conway’s game of life. What you’ll see is data itself self-organizing based on priorities that are important for the whole system – the application, the server, the network and the storage. In effect, you’ll have autopoiesis, a self-adaptive system.


I should note that what I’m referring to here are really, really large systems of storage, not necessarily smaller host-to-storage-array products. There are a lot of stars that need to align before we can see something like this as a reality. Again, this is my personal view.  


Nathan: I can definitely see why you called this out as your opinion. You’re looking pretty far in to the future. What if we pull back to the next 18 – 24 months, how do NVMe fabrics play out?


Nathan: I know. I’m constraining you. Sorry about that.


J: <Laughs> In the near term we’re going to see a lot of battles. That’s to be expected because the standards for NVMe over Fabrics (NVMe-oF) are still relatively new.


Some vendors are taking shortcuts and building easy-to-use proprietary solutions. That gives them a head start and improves traction with customers and mind share, but it doesn't guarantee a long-term advantage. DSSD proved that.


The upside is that these solutions can help the rest of the industry identify interesting ways to implement NVMe-oF and improve the NVMe-oF standard. That will help make standards-based solutions stronger in the long run. The downside is that companies implementing early standards may feel some pain.


Nathan: So to close this out, and maybe lead the witness a bit. Is the safest way to implement NVMe – today – to implement it in an HCI solution and wait for the NVM-oF standards to mature?


J: Yeah. I think that is fair to say, especially if there is a need to address manageability challenges. HCIS absolutely helps there. For customers that do need to implement NVMe over Fabrics today, Fibre Channel is probably the easiest way to do that. But don’t expect FC to be the only team on the ball field, long term.


If I go back to my earlier point, different technologies are optimized for different needs. FC is a deterministic storage network and it’s great for that. Ethernet-based approaches, though, can be good for simplicity of management, though it’s never a strict “either-or” when looking at the different options.


I expect Ethernet-based NVMe-oF to be used for smaller deployment styles to begin with, single switch environments, rack-scale architectures, or standalone servers with wicked fast NVMe drives connected across the network via a Software Defined Storage abstraction layer. We are already seeing some hyperconvergence vendors flirt with NVMe and NVMe-oF as well. So, small deployments will likely be the first forays into NVMe-oF using Ethernet, and larger deployments will probably gravitate towards Fibre Channel, at least in the foreseeable time frame.





As we closed out our conversation J made a comment about NVMe expanding our opportunity to address customer problems in new ways.


I can’t agree more. In my mind, NVMe can and should serve as a tipping point that forces us, vendors, to rethink our approach to storage and how devices in the data path interoperate.


This applies to everything from the hardware architecture of storage arrays; to how / when / where data services are implemented; even to the way devices communicate. I have some thoughts around digital force feedback where an IT infrastructures resists a proposed change and respond with a more optimal configuration in real-time (imagine pushing a capacity allocation to an array on your mobile phone and feeling pressure of it resisting then responding with green lights over more optimal locations & details on why the change is proposed), but that is a blog for a day when I have time to draw pictures.


The net is that as architects, administrators and vendors we should view NVMe as an opportunity for change and consider what we keep vs. what we change – over time. As J points out NVMe-oF is still maturing and so are the solutions that leverage it. So to you dear reader:


  1. NVMe on HCI (hyper-converged infrastructure) is great place to start today.
  2. External storage with NVMe can be implemented, but beware anyone who says their architecture is future proof or optimized to take full advantage of NVMe (J’s comment on overloading CPUs is a perfect example of why).
  3. Think beyond the box. Invest in an analytics package that looks at the entire data path and lets you understand where bottlenecks exist.


Good hunting.


NVMe 101 – What’s Coming to the World of Flash?

Is NVMe Killing Shared Storage?

NVMe and Me: NVMe Adoption Strategies

NVMe and Data Protection: Time to Rethink Strategies

NVMe and Data Protection: Is Synchronous Replication Dead?

How NVMe is Changing Networking (with Brocade)

Hitachi Vantara Storage Roadmap Thoughts

An In Depth Conversation with Brocade


As we've discussed over the last several blogs, NVMe is much more than a communication protocol. It’s a catalyst for change. A catalyst that touches every aspect of the data path.


At Hitachi we understand that customers have to consider each of these areas, and so today we’re bringing in a heavy hitter from Brocade to cover their view of how data center network design changes – and doesn't change – with the introduction of NVMe.


The heavy hitter in this case is Curt Beckmann, principle architect for storage networking. A guy who makes me, someone who used to teach SEs how to build and debug FC SANs, feel like a total FC newbie. He’s also a humanitarian, on the board of Village Hope, Inc. Check it out.


Let’s dig in.


Nathan: Does NVMe have a big impact on data center network design?


Curt: Before I answer, we should probably be precise. NVMe is used to communicate over a local PCIe bus to a piece of flash media (see Mark’s NVMe overview blog for more). What we want to focus on is NVMe over Fabric, NVMe-oF. It’s the version of NVMe used when communicating beyond the local PCIe bus.


Nathan: Touché. With that in mind. Does NVMe-oF have a big impact on network design?


Curt: It really depends on how you implement NVMe-oF. If you use a new protocol that changes how a host interacts with a target NVMe device, you may need to make changes to your network environment. If your encapsulating NVMe in existing storage protocols like FC though, you may not need to change your network design at all.


Nathan: New protocols. You’re referring to RDMA based NVMe-oF protocols, right?


Curt: Yes. NVMe over Fabrics protocols that use RDMA, iWARP or RoCE, reduce IP network latency by talking directly to memory.  For NVMe devices that can expose raw media, RDMA can bypass CPU processing on the storage controller. This allows faster, more ‘direct’ access between host and media. It does however require changes to the way networks are designed.


Nathan: Can you expand on this? Why would network design need to change?


Curt: Both iWARP and RoCE are based on Ethernet and IP. Ethernet was designed around the idea that data may not always reach its target, or at least not in order, so it relies on higher layer functions, traditionally TCP, to retry communications and reorder data. That’s useful over the WAN, but sub-optimal in the data center. For storage operations, it’s also the wrong strategy.


For a storage network, you need to make sure data is always flowing in order and is ‘lossless’ to avoid retries that add latency. To enable this, you have to turn on point-to-point flow control functions. Both iWARP and RoCE v2 use Explicit Congestion Notification (ECN) for this purpose. iWARP uses it natively. RoCE v2 added Congestion Notification Packets (CNP) to enable ECN to work over UDP. But:


      1. They aren't always ‘automatic.’ ECN has to be configured on a host. If it isn't, any unconfigured host will not play nice and can interfere with other hosts’ performance.
      2. They aren't always running. Flow control turns on when the network is under load. Admins need to configure exactly WHEN it turns on. If ECN kicks in too late and traffic is still increasing, you get a ‘pause’ on the network and latency goes up for all hosts.
      3. They aren't precise. I could spend pages on flow control, but to keep things short, you should be aware that Fibre Channel enables a sender to know precisely how much buffer space remains before it needs to stop. Ethernet struggles here.


There are protocol specific considerations too. For instance, TCP-based protocols like iWARP start slow when communication paths are new or have been idle, and build to max performance. That adds latency any time communication is bursty.


Nathan: So if I net it out, is it fair to say that Ethernet and NVMe is pretty complex today?


Curt: (Smiles). There’s definitely a level of expertise needed. This isn't as simple as just hooking up some cables to existing switches. And since we have multiple RDMA standards which are still evolving (Azure is using a custom RoCE build, call it RoCE v3), admins will need to stay sharp. Which raises a point I forgot to mention. These new protocols require custom equipment.


Nathan: You can’t deploy iWARP or RoCE protocols on to installed systems?


Curt: Not without a NIC upgrade. You need something called an R-NIC. There are a few vendors that have them, but they aren’t fully qualified with every switch in production.


That’s why you are starting to hear about NVMe over TCP. It’s a separate NVMe protocol similar to iSCSI that runs on existing NICs so you don’t need new hardware. It isn't as fast, but it is interoperable with everything. You just need to worry about the network design complexities. You may see it ultimately eclipse RDMA protocols and be the NVMe Ethernet protocol of choice.


Nathan: But what if I don’t care Curt? What if I have the expertise to configure flow control, plan hops / buffer management so I don’t hit a network pause? What if R-NICs are fine by me? If I have a top notch networking team, is NVMe over Fabric with RDMA faster?


Curt: What you can say is that for Ethernet / IP networks, RDMA is faster than no RDMA. In a data center, most of your latency comes from the host stack (virtualization can change the amount of latency here) and a bit from the target storage stack (See Figure 1). That is why application vendors are designing the applications to use a local cache for data that needs the lowest latency. No wire, lower latency. With hard disks, network latency was tiny compared to the disk, and array caching and spindle count could mask the latency of software features.  This meant that you could use an array instead of internal drives. Flash is a game changer in this dynamic, because now the performance difference between internal and external flash is significant.  Most latency continues to be from software features, which has prompted the move from the sluggish SCSI stack to faster NVMe.


Figure 1: Where Latency Comes From


I've seen claims that RoCE can do small IOs, like 512 bytes, at maybe 1 or 2 microseconds less latency than NVMe over Fibre Channel when the queue depth is set to 1 or some other configuration not used in normal storage implementations.  We have not been able to repeat these benchmarks, but this is the nature of comparing benchmarks.  We were able to come very close to quoted RoCE numbers for larger IO, like 4K. At those sizes and larger, the winner is the one with faster wire speed. This is where buyers have to be very careful. A comparison of 25G Ethernet to 16G FC is inappropriate. Ditto for 32G FC versus 40G Ethernet. A better comparison is 25G Ethernet to 32G FC, but even here check the numbers and the costs.     


Nathan: Any closing thoughts?

Curt: One we didn't really cover is ease of deployment alongside existing systems. For instance, what if you want to use a single storage infrastructure to support NVMe-oF enabled hosts and ‘classic’ hosts that are using existing, SCSI based protocols? With FC you can do that. You can use existing Gen 5 and Gen 6 switches and have servers that supports multiple storage interface types. With Ethernet? Not so much. You need new NICs and quite possibly new switches too. Depending on who you speak with DCB switches are either recommended, if you want decent performance, or required. I recommend you investigate.



Every vendor has their own take on things, but I think Curt’s commentary brings to light some very interesting considerations when it comes to NVMe.




  1. Ecosystem readiness – With FC (and maybe future Ethernet protocols), NVMe may require minimal to no changes in your network resources (granted, a network speed upgrade may be advised). But with RDMA, components change, so check on implementation details and interop. Make sure the equipment cost of changing to a new protocol isn't higher than you expect.
  2. Standard readiness – Much like any new technology, standards are evolving. FC is looking to make the upgrade transparent and there may even be similar Ethernet protocols coming. If you use RDMA, great. Just be aware you may not be future proofed. That can increase operational costs and result in upgrades sooner than you think.
  3. Networking expertise – With Ethernet, you may need to be more thoughtful about congestion and flow control design. This may mean reducing the maximum load on components of the network to prevent latency spikes. It can absolutely be done, you just need to be aware that NVMe over Fabric with RDMA may increase operational complexity that could result in lower than expected performance / ROI. To be clear though, my Ethernet friends may have a different view. We’ll have to discuss that with them.


Other than that, I’ll tell you what I told myself when I was a systems administrator. Do your homework. Examine what is out there and ask vendors for details on implementations. If you buy a storage solution that is built around what is available today, you may be designing for a future upgrade versus designing for the future. Beware vendors that say ‘future proof.’ That’s 100% pure marketing spin.

An Interview with Bob and Bob

Cat herding by Nathan Moffitt



So here we are, just a few weeks in to the life of Hitachi Vantara. Since the formation of Hitachi Vantara there has been a lot of press and activity around everything from infrastructure offerings, to IoT solutions, to the vision of the new company. For those interested in an overview of Vantara there are some great blogs like Mary Ann Gallo’s up on our community and a detailed press release.


These announcements provide a number of insights into who we are and where we are headed, but we recognize there is – and always will be – an insatiable desire to know more. Especially around core development areas like IT infrastructure.


To help satisfy that desire and provide a ‘blunt hammer’ forum to educate certain vendors that can’t read a press release, I sat down to talk storage with two IT infrastructure leaders at Hitachi Vantara. Bob O’Heir, VP of product management and Bob Madaio, VP of marketing.


<Insert Office Space joke here – I do regularly.>


Moffitt: To start, you’re both responsible for storage and broader IT infrastructure offerings, right?


O’Heir: Is this even a question?


Moffitt: You know why I’m asking.


O’Heir: <Sighs> Yes. We’re not a ‘storage only’ company. We’re a data company. That means we think about the entire infrastructure. Our teams design storage solutions, but we also design integrated systems and management software that optimizes overall IT operations.


Madaio: Agreed. I won’t belabor the point.


Moffitt: Ok. With that in mind, what changes for you with the formation of Hitachi Vantara?


O’Heir: I see a huge opportunity to integrate our infrastructure offerings more tightly with our analytics technologies. Some of this is already in progress, but with the formation of Vantara, our teams are better aligned to co-develop solutions that help customers gain deeper insights from their data.


Madaio: I see the Hitachi Vantara structure enabling us to more easily share ideas and deliver data-driven solutions. Look, each of the entities that makes up Vantara was already working with our customers and each other. Now though, we’re more integrated. It's easier for customers to engage with externally and easier for developers to work together internally. That lets us be faster to market, more customer-friendly and customer-centric, which is a big win for us and customers.


Moffitt: Is storage going to be a key investment area for us as we design new data-driven solutions?


O’Heir: Of course. Storage, and more precisely infrastructure, is a key strength for Hitachi Vantara. Continuing to design products and services in this space makes us a better company to work with because we can bring experience around storing, monitoring, protecting and delivering data to IoT solutions. It also gives us the ability to provide collection points – storage – for helping customers analyze and decide what to save. We can also make storage behave more like an IoT device, which is critical for all IT components moving forward. <Pauses> Think about it like this, does GE quit making equipment now that they do IoT software? No.


Madaio: <Chuckles> Our corporate focus is on helping people activate and leverage their data. Not leveraging our heritage in information storage and data management would be absurd. When comparing us to other industrial powerhouses, having a strong IT business is a pretty unique differentiator. It lets us create differentiated solutions for deriving value from data based on knowing how customers ACTUALLY deploy IT infrastructures. It also provides launch points so customers can start with storage and add OT capabilities as they become more data-driven. Pigeon holing ourselves into one market segment would be a fast path to oblivion.


Moffitt: Let’s continue down this path. How does IT + OT change the evolution of our infrastructure portfolio?


Madaio: It changes our design focus. With the combination of IT, IoT and analytics we’re better able to deliver on customer outcomes versus focusing solely on a new ‘box’ or application. With Vantara we move beyond thinking about form factor and dropping data into a fixed location where it just. sits. Instead we focus on data services that provide a consistent way to access and leverage data anywhere. This is key as the locations where data is born and needs to be analyzed expands.


O’Heir: Exactly. That has a huge impact on how we approach storage in particular. Software-defined storage (SDS) is and will be big for us moving forward. We want customers to consume data services in a very flexible fashion. It might be on a 1U server at the edge or a high-end server / custom built controller at the core which is optimized for maximum performance and uptime. Edge to core analytics will also be a big consideration.


Moffitt: Talk more about that. How do analytics and storage fit together in the new Vantara?


O’Heir: There are 2 aspects to consider. First, how do you view a storage system as an IoT device? Every storage system is collecting all kinds of telemetry data. By pulling analytics information from it into our Smart Data Center software we can better optimize storage behavior and enable the array to work with other parts of the infrastructure to optimize the entire data path. Of course this also means that the ‘language’ arrays speak changes too.


Moffitt: MQTT (note: a protocol that can be used by IoT devices) for instance?


O’Heir: Yep. That might be for passing information to another infrastructure component or it might be for receiving data from an IoT device. Protocols aren't static. They are always changing and with Vantara we have the ability to be forward thinking about how to transmit data. Hitachi Universal Replicator is an example of that thought process. When released it was revolutionary, it used more of a pull vs. push method to better tolerate outages and reduce bandwidth consumption. With IoT, protocols have to change.


The second aspect is looking at what we can do while we hold the data. If you have the data, why not perform some level of analytics on it? I equate this to the old argument about where you run functions like replication. Yes, you can run them on the application server, but why pull cycles from the host for that? Offload it to an array. The same thing is true of analytics. If data is resident, storage could pull metadata and make predictions about whether to retain the data or just the metadata.


Madaio: Of course, exploiting this data gravity is much easier if we still develop storage. To be sure, we could simply produce the analytics software, but if we provide a fully integrated system and broader infrastructure offerings, we reduce complexity of deployment and acquisition. And we add accretive value. Oh, and I recommend folks check out a recent blog I did on storage as an IoT device. It ties right into this conversation.


Moffitt: Accretive. Good word. It seems like this means there a blend of our corporate technologies.


O’Heir: It certainly enforces and helps drive where we want to go with simplification of operational processes. When you blend data services and analytic services you reduce the number of resources you deploy and the complexity of optimizing resources. You also open up a broad range of opportunities to deliver value.


Moffitt: Talk about that. What opportunities does this open up?


O’Heir: Well, I don’t want to say much here.


Madaio: Really? You seem like a sharing kind of guy. And a huge Michael Bolton fan.


O’Heir: For my money, I don't know if it gets any better than when he sings "When a Man Loves a Woman". <Pause> Ok, one example. Data ownership and privacy is a growing concern. You need to think about a whole new host of things when you store data. Can analytics be allowed on a data set, is data within proper ‘borders,’ things like that. Having the ability to do some level of base security analytics in storage lets you make decisions about where / how to replicate it, etc. Yes, users can set their own controls, but accidents can happen. If storage can help prevent missteps in data handling, everyone wins.


Madaio: The key thing for me in all this is that these are things every storage vendor will need to consider unless they want to become a ‘just a bunch of disk’ provider. Storage design must change if vendors want to add value.


Moffitt: Let’s close out. Talk to me about what storage looks like in 5 years.


O’Heir: Timelines are tough, but directionally I think storage providers will need to think about traditional items like performance, capacity and resiliency as well as analytics facilitation. With all the data being produced from edge to core you’re going to need every system that retains data to be mining it. I talked to a financial institution recently that is very concerned about this. They see an explosive growth coming in the number of data points they have to gather every few microseconds. Microseconds. How do you process all of that? Yes, you can do edge or cloud processing, but why not in the storage? For some architectures that may be an imperative.


That leads to fundamental architecture design changes that I think, hope, all infrastructure vendors are considering - microservice architectures. If you have the ability to insert an analytic function into an array then data scientists can develop and run analytics from where-ever the data repository is.


And Of Course There Was More.


In every blog I write – personal or interview – a lot of detail ultimately ends up on the cutting room floor. In the case of this interview we had to scrap some conversational elements around NVMe, Microservice architectures, data versus control plane functionality and even what competitors are still standing in 5 years.


If you’re interested in that detail, let me know. We might be able to create a part 2 for this blog. We could even pull in another smart VP of product management, Bob Primmer. Why does Hitachi Vantara have so many Bob’s well that is a blog in itself.


To close, for now, here are the takeaways I’d point out.


  1. Hitachi Vantara is still developing storage (shocking I know), but long term it may not look like the storage you know today. We see opportunities for massive innovation.
  2. Innovating across the entire infrastructure – not only storage – is critical for vendors to stay relevant and deliver maximum value to customers.
  3. Analytics and infrastructure are blending together, having expertise in one allows you to develop more impactful solutions in the other.


Hopefully you found this blog enjoyable. If so, let me know! Until next time. Bob and Bob, thank you.


As we've been discussing in recent blogs, NVMe holds the promise of lightning fast data transfers where business operations complete in microseconds, enabling faster decision making, richer business insights and better customer experiences.


To achieve that promise though you have to be mindful of how your data path is designed. Value added services that consume resources or need to be processed as part of a data transfer can affect the benefits of NVMe. One value added service that seem like a stop light in front of your NVMe sports car is synchronous replication, a key tool for preventing data loss if a system or site goes offline.


Which leads us to an important question. In a world of microsecond latency NVMe transactions, is it time for synchronous replication to get sent to the rubbish heap?



Figure 1: Common Synchronous Replication Workflow

Synchronous Replication. Safe But ‘Slow.’


Quick review: synchronous replication can be defined as a business continuity process that mirrors IO between two systems to prevent data loss and downtime. As shown in figure one, the traditional practice is as follows:


  1. Server sends an IO to a primary storage array
  2. Primary array mirrors IO to a target array
  3. Target array acknowledges IO is received
  4. Primary array responds to server


I’ve truncated things, but that should cover the basics.


The challenge with synchronous replication comes from how long it takes to send IO from the primary array to a target array, get an acknowledgement back and then tell the server the IO has been successfully committed. The amount of time depends on several factors including distance, routing and line quality. For this article let’s assume that 100KM of distance adds roughly 1 millisecond of latency.


WHY YOU CARE: 1 millisecond might not seem slow, but with NVMe… it could be considered a snail’s pace. Since certain NVMe implementations can theoretically transfer data in the sub-100 microsecond range, 1 or more milliseconds can easily translate to a 10x slow down, crushing the value of NVMe. Distance does not make the heart grow fonder with NVMe.


Note: Read IO are serviced by the primary storage array and no data needs to be sent to the target. For read intensive applications the impact of NVMe will be lower, but unless your workload is read-only… you need to consider the effects.


Value Add or Speed Impediment?


At this point you might be thinking, yes synchronous replication is a speed impediment and not worth using. For certain use cases, you might be right. Speed demons (IT teams pulling data from drones; data scientists crunching numbers) may see no need for synchronous replication. Of course they’re also likely to get rid of almost any feature that sits in the data path as we discussed in previous blogs.


But for business operations where loss of any data could impact financial viability / result in lawsuits, it is better to have a little slow down than risk losing data. For that reason I find it hard to believe synchronous replication will go away.


The question will instead be how we redesign our approach to synchronous replication so that latency is minimized (eliminating latency is hard… for now). There are several ways this could play out:


  • Predictive replication. Synchronous replication occurs as usual, but the source system analyzes previous IO data (RTT, etc.) to determine if it should wait for an acknowledged by the target system before responding to the host. If IO has been consistently stable for x amount of time, this more semi-synchronous method might be acceptable for a limited number of IO transfers.
  • Host side replication. While this doesn't resolve the overall latency of transferring data to a remote site, it does ‘cut out the middle man.’ Having a host manage replication would eliminate latency of having the source array broker IO and, if paired with predictive replication, could improve IO response times while improving data protection.
  • Staging or tiering. In this instance, synchronous replication is not performed on initial data capture (hold on, it makes sense). Instead it is performed after initial processing. For workloads where raw data is captured, processed and only the ‘output’ is saved, synchronous replication can and should wait until the final product is created. That lets you get both speed and security. And to be fair, this strategy aligns very well with workloads where NVMe adds most value.
  • Faster networking. It is feasible that we could have faster connectivity across metropolitan sites, enabling at least semi-synchronous, replication to make it into strategies. Physics will continue to be an issue (insert PM joke on requesting a change to the speed of light) but with higher quality links and even caching stations, latency could be beat into some level of submission - without having to fold space & time.


WHY YOU CARE: While some of this is theoretical, it does demonstrates that synchronous replication can / could be combined with NVMe to deliver performance and business continuance if strategies are adjusted. It also demonstrates that storage solutions can change to better leverage NVMe. Coupling host side data services with traditional AFAs or software-defined storage would be a perfect example of this. Similarly software-defined networking with NFV service insertion could provide a model to follow.




When all-flash arrays were introduced a number of folks said synchronous replication was dead because it impacted the overall speed of data transfers. And yet, synchronous replication continues to be deployed. Startup AFA vendors have even adopted it to compete in the enterprise.


With NVMe the same thing is likely to be true. Yes, some types of solutions (rack-scale flash) may eschew synchronous replication because it doesn't fit their target workloads. But for vendors that serve enterprise workloads, expect synchronous replication to remain – just optimized for NVMe.


From a Hitachi perspective, we are embracing both paths: high performance solutions that don’t necessarily need synchronous replication because of the workload type as well as enterprise solutions that include synchronous replication because of the criticality of business continuance.


For this second area, we could slap together some communication protocols, PCIe and high speed interfaces on an existing system, but that isn’t in our DNA. Instead, we’re taking the time to examine the best approach to delivering NVMe so that we don't’ sell our customers one thing and then immediately refresh it with something else because we’ve optimized software and hardware for critical functions like synchronous replication. That may mean we aren’t first in market, but it does mean we’ll have the industry’s most resilient offering in market. One that supports the next generation of IoT, analytic driven smart data centers.


Other blogs to read:


NVMe and Me (The Journey to NVMe)

NVMe 101 – What’s Coming to the World of Flash?

Is NVMe Killing Shared Storage?

If there is one thing that is certain in life, it’s that nothing is static. Change is inevitable. And with change comes the need to rethink the way we approach… everything.


Point in case, NVMe. As previous blogs covered, NVMe and other flash advancements are forcing us to re-examine the way we architect IT solutions. Application IO stacks, networks, storage software and a host of other items can – and probably will – be tweaked or totally redesigned to get maximum value from NVMe.


This includes data protection. Initial press for NVMe has been around increased IOPS and lower latency, but if you read between the lines you’ll see it changes how data is safeguarded.


Why? Because data protection isn’t a zero add process. It requires your storage array to do ‘stuff,’ and that ‘stuff’ either takes resources – which NVMe desperately wants to consume – or time – for which NVMe has no patience. In fact, NVMe is already tapping its foot and looking at its watch!


The end result is that as vendors and IT leaders optimize for NVMe, data protection implementations will change. And that not only affects your IT best practices, it affects how vendors design storage solutions. So buyer be aware, it’s time to put on that thinking cap.



A good place to start looking at this topic is with the flash media where data lives. Luckily, NVMe offerings tend to have similar uptime metrics when compared to current SSDs. That includes MTBF and DWPD. Still, NVMe media isn’t fully mainstream yet and you should consider:


  • Dual Porting: Provides 2 connections from media to backplane. If one fails, the other allows IO to continue. Dual porting doesn’t impact speed, but it does affect price. Depending on who you ask, dual port NVMe drives are 20%, 50% or more expensive than current generation SSDs.


  • Hot Swap: Allows you to pull a drive and replace it while your storage system is running. Again, this doesn’t impact speed, but it does affect uptime. Check this carefully. Hot swap testing at plug-fests is revealing that hot swap of PCIe NVMe doesn’t always work.


WHY YOU CARE: NVMe drives are still coming of age and this can affect future-proofing. If your vendor uses single port drives to reduce costs or does not support hot swap, a refresh may be in your near future.



The next step up the data protection stack is RAID or more advanced forms of erasure coding (EC). Both enable storage solutions to continue serving data if one or more SSDs fails, but they are ‘expensive.’


RAID 1 mirroring is fast but cuts capacity in half. RAID 6 and advanced EC minimize the capacity tax but add processing overhead that slows down IO. In fact, maximum throughput can easily be cut in half and latency doubled. As a result, expect NVMe vendors to recommend you switch to RAID 1 / 10 to lower overhead.


WHY YOU CARE: For many organizations, this will change IT best practices. It may also increase overall solution costs. It may also signal that your vendor’s RAID / EC strategy may change. Read on.


Don’t expect vendors to throw out RAID 6 and EC just yet though. Instead, expect to see new ways of implementing these technologies including:


  • Offload. Accelerating tasks by offloading them to an add-on card or separate set of resources can reduce overhead, but it is likely necessitate a new (or additional) hardware controller.
  • Change the location. This is what some startup NVMe players in the analytics space are doing. Rather than put burden on the array, they move it to the host. Hosts that need advanced resiliency will have a burden, but hosts that don’t will see improved speeds.
  • Rewrite the algorithm. In theory, you can streamline code and release it via a software update. The downside is that you would have to create new RAID volumes and move existing data to the new volumes. That can be transparent, but it takes extra ‘swap capacity.’




WHY YOU CARE: Each approach has its merit depending on workload, but all have impact. Adding an offload function will require a tech refresh (impacts future-proofing). Moving tasks to the host requires a rework in strategy and a new set of products (more on this in future blogs). Rewrite of code forces rebuilding data volumes (impact to IT operations and the need for more storage).



If we continue on to advanced data protection services like snapshotting and replication, a similar line of questioning occurs: how much latency will the data service introduce? If the impact on latency and IO is high, do you turn off a service?


For most data it is unlikely you’ll turn off data protection but other data services like deduplication and compression that eat up resources and can sit in the data path adding latency to IO? Hmm… That is a story for another day. So what is an administrator to do? Let’s look at 2 common DP functions.


Snapshots: For important data that drives business operations, snapshots for backup are key. You should be aware though that the do consume resources and can add latency to IO. Keep in mind:


    • The snapshot methodology will influence overhead. The more writes, the more overhead.
    • If you have to quiesce a data set (e.g. with a consistency group) latency will go up.


To minimize the impact snapshots have on performance you can change the method, avoid quiescing or change the frequency so potential impact is infrequent.


Asynchronous Replication: Asynchronous replication creates a remote copy of a data set for recovery in the event of a site failure. It does not usually sit ‘in the data path,’ meaning it can happen at after an IO is processed. This avoids potential impact to NVMe latency. That said, replication does require CPU cycles and could steal resources, impacting maximum performance.


Similar to snapshots, you can minimize any impact by changing the amount of resources used and the frequency of transfers.


Note: Synchronous replication is a bird of another feather and will be reserved for a future blog.


WHY YOU CARE: Understanding how many resources are required for data protection tasks and if they sit in the data path may change a number of things including the number of workloads / size of data sets you host per array. This is why it is critical for solutions to allow ANY data management service to be toggled on or off. It is also why you should expect future NVMe-optimized solutions to have new forms of QoS for fencing resources – not just by host, but by service.



At a high level the big take away here is that NVMe isn’t something you just plug into your system and say ‘go!’ To get the promised increases in throughput and low latency it is imperative we consider how IO is influenced by data services, especially data protection.


When buying NVMe solutions you’ll want to consider:



  1. What influence do my data protection services have on throughput and latency?
  2. Do I need to change my approach to RAID, snapshots and replication to improve IO?
  3. Do I need to reduce the amount of workloads I host per array or ‘beef up’ my array?
  4. How is my vendor responding to NVMe and will it require a new controller or software?
  5. Should I consider a totally new approach to where data services run?


These questions are key to determining how you adjust your strategy, how you size an NVMe storage purchase AND if you are buying a solution that may be out of date soon.


As we discussed, be sure to look for an offering that uses dual ported, hot pluggable media; a RAID / EC strategy that is upgradeable (doesn’t require a new controller); selectable data services; etc. There are other considerations (networking for instance), but that is a topic for another blog.


You’ll also want to be wary of marketing numbers that claim ‘100 microsecond latency’ but don’t tell you what data services (RAID, replication, snapshots or others) are running. Because as we all saw with the initial wave of flash offerings. There was the speed you saw on day one, and the speed you saw on day 30. And boy, were they different.



You've heard it before, I know. Companies need to take advantage of digital transformation in order to achieve next level growth. But digital transformation is *not* a light switch - where you can just flip it on and voila, you're done.


Digital transformation uses technology to enable efficiency, differentiation and innovation, to the benefit of and for all aspects of an organization’s activities and processes.


But before an organization can realize the full benefits of digital transformation, they still need to “take care of business” from an IT services delivery perspective.


IDC did a recent study of IT staff teams, and the results are interesting.


IDC survey data indicates that 45% of IT staff time is taken up by routine operations like provisioning, configuration, maintenance, monitoring, troubleshooting, and remediation whereas only 21% is allocated to innovation and new projects.


Many routine tasks are automatable and others may be dramatically simplified and streamlined.


First Step - The very very first step in simplification and streamlining is the ability to meet business demands by setting up additional resources to serve clients  - and in our topic today, it’s storage.  Hitachi Virtual Storage Platform.


With the industry’s leading 100% data availability guarantee – one could understandably believe that configuring such a robust system would be very involved and complex – and dare I say it, may even involve rudimentary, but much loved CLI (Command Line Interface). CLI provides control and power yes. But simple and intrinsic CLI is not.



6 Steps to Configure a VSP, with Hitachi Storage Advisor.


Some background info for readers new to Hitachi -  we developed Hitachi Storage Advisor (HSA) in response to customer input for an intuitive, graphical user interface to easily manage our VSP platforms. HSA accomplishes this by delivering a simplified, unified approach to managing storage


  1. Quick and simplified VSP storage deployment – HSA accomplishes this by abstracting complex unified management tasks for both block and file requirements into fewer and less complex steps
  2. It saves time by using wizard-driven workflows that enable storage management operations to be completed faster
  3. And yes…<6 steps…to configure and provision a VSP for SAN & NAS deployments




Realizing our customers may want to deploy VSP in both SAN & NAS environments, we designed HSA to manage both of these…easily.

Hitachi Storage Advisor can configure, provision and (locally) protect aVSP in <6 steps for both block and/or file workloads, here’s how..

  1. First Discover the storage array
  2. Create and initialize parity groups – HSA automates this based on best practices
  3. Create either block or file pools
  4. Create hosts
  5. For block; engage the create volume, attach volume and protect volume multi-operation workflow
  6. For file;  create file system, share and exports
  7. And that’s it..done


**Notice that the single workflow in step #6 not only creates volumes and provisions them but can also (optionally) protect them.**


In addition this single “federated” workflow, can provision from ANY of the discovered/on-boarded data center-wide storage systems


Via a single HSA console, HSA can manage VSP’s deployed in the data center and remote locations.


A single HSA instance can manage up to 50 storage systems, be they local, remote or a combination of local and remote. If you want remote staff to manage your systems, that’s also ok, as HSA was designed for non-storage experts so that almost anyone can  configure and provision a storage system, as well as monitor capacity utilization, and other storage management issues without having to go back to the experts.


And Hitachi Storage Advisor simplifies complex tasks by and abstracting/hiding these complexities with intuitive easy to use workflows that are based on best practices. Ex. HSA auto categorizes drive capacities into tiers, which enables one-click parity group creation and initialization. These tiers are then leveraged to further simplify the pool creation operation.


Note: when creating pools HSA (in the background) makes sure that pool creation best practices are adhered to. This protects customers from creating pools with two few PGs for example.


Hitachi Storage Advisor is a key, and first step in helping customers realize the full benefits of digital transformation. By greatly simplifying initial VSP deployment and on-going management, customers can re-direct these IT resources for more impactful and strategic projects that will help them achieve full benefits of digital transformation for their business.


Stay tuned for more data center management related blogs to come.


Thanks for reading.




When we have seen what Flash has done to change our businesses in such a short space of time, it is no wonder that the talk around NVMe-based solutions is becoming such a high performance and exciting proposition. Just one tiny, small detail… how on this earth do you move your business planning to make use of this technology? Oh… and why should you care?!


Now, I used to be a customer, an IT leader, much like many of you who are reading this blog. I used to be someone who loved to live on the bleeding edge of solutions, although this was more often than not dictated by my industry.


I worked in Motorsport for 10 years, but the most challenging had to be my time in Formula 1. I arrived at the dawn of downsizing and consolidation, the death of the big 3.0Ltr V10 engine and introduction of 2.4Ltr V8 (19,000 RPM and 800HP). One slight issue did arise, the move from a V10 to a V8 changed the vibration resonance in the garage when the cars were started… in fact it was just enough to make every spinning HDD in our garage wobble... ahhh the Blue Screen of death and no Telemetry… I wasn’t a very popular chap and I didn’t get much sleep that week in Bahrain I can tell you!




So, in 2006 (goodness I feel old) I went all flash… in EVERYTHING. Also, this did have another effect…WOW our application performance really had been supercharged! On the other hand, my CIO nearly had a heart attack when we needed more than the predicted budget for the year . But this is for the business, we are there to win races and championships and it’s no good if you cannot even start the car !


As we look at the dawn of the next generation of flash storage coming with NVMe, I am getting excited all over again about how this can really transform businesses to the next level with the performance on demand from NVMe over PCIe. I just have this little niggle, I have this notion like the first time someone said I should try “HD movies” (sorry I was too young for the Betamax VHS saga), but do you want HD DVD or Blu-ray? I certainly do not want to be stuck with HD DVD!





Lets take a look what we need to take into account before we jump straight into NVMe Flash:


1. Multiple form factors – Right now a standard hasn’t really been decided, of course NAND will be custom built by vendors but we also have the flash manufacturers wading in. Just look at NGSFF SSD from Samsung vs Intel’s Ruler SFF-TA-1002. Does anyone want to be locked into a single vendor without a certified standard?


2. Architecture choices - the main changes from SAS to NVMe is a direct path to the processor, either for the compute host or storage controller. Which makes the most sense? Hyper-converged or SAN? Either way these are point solutions, how can I use both? You can read more on this topic here.


3. Impact of data services – to get the best performance latency is going to be the enemy here at any point. So data reduction and replication services are almost out of the question for the time being. Long live tiering!


4. Evolving connectivity standards – Eventually we are going to want to extend performance outside of the array or between hosts using fabric connectivity… we don’t want a bottleneck in the network so you are going to need a 100Gb/S network… but there is a mix of options here and what do you choose? OMNIPath, RoCEv2, NVMe over FC … which protocol do you choose? To make matters more interesting, some vendors are using proprietary networking protocols. Whichever way, you are going to have to replace your current network.


5. Limited scalability - the limit today is 10s of devices on a PCIe bus and not the many hundreds that can be found in SAS-based AFAs. Scale-out can help resolve this challenge but it is expensive to add controllers when you only need more capacity and the communication latency between nodes will significantly reduce the performance advantage that NVMe was designed to deliver.  Future PCIe switching designs may help address the scalability issue but it isn’t ready yet.


Ummm – so all is not as it seems out the box… we need to make a few decisions. It seems like the unknowns at present could lead us to be stuck with a solution that could not be upgradable in the future, or even a dead form factor. So how can I future proof myself?



What options do we have?


In the first instance, you can take a standard SAN array that is making using of NVMe backend technology to accelerate your workloads. This would allow you to supercharge your storage solutions, while making use of tiering technology to use multiple types of flash in one solution. This would then enable you to consolidate your storage performance classes onto one solution… who said tiering was dead? What is cool at this point is then enabling you to focus on automating the workloads between the classes of storage, hypervisor (because we all need more than one), and compute. While at the same time you can make use of enterprise replication and data reduction services.


The second option is to look at scale out hyper converged solutions using the Skylake processors from Intel. Having the workload being able to access the storage and compute on the same silicon makes for incredible response times. Also as workloads are localized on a per node basis we can really make use of PCIe Flash as a caching layer before committing our writes into enterprise SAS Flash. The cost of growth is so granular, it’s just another server appliance so you can pay as you grow rather than high upfront investment.


Both options have one major issue as highlighted earlier…. You are going to have to replace your network to really make use of the technology. In both instances, we are moving the bottleneck around… you are going to need a 100GB network! This is a real cost in making real use of the very high performance on offer from NVMe. Again, there is not a clear direction in which the market is going to go… I mean, how many of you have FCoE deployed?




The right Software Defined strategy could be the answer


As we already are aware from SDS and SDN, decoupling the software away from the hardware gives us the flexibility to make use of whatever hardware platform is right for each workload. The main outcome, especially as there are technology and topology unknowns, allows us to keep delivering constant SLAs to the business by automating the data center with software.


Do you think Amazon or Google give a bugger about what hardware they use? NOPE!


One thing I do want is an enterprise level quality from my software, I do not want another point solution again, that would defeat the object! The software must span the data center, from storage arrays to hyper-converged and must be hypervisor and cloud agnostic… every customer I am speaking to is exploring every option in reducing cost and consolidating their tools.


The ideal solution is where I can still buy my storage platform with NVMe backend for my key business applications… but I also have the same storage software running on a scale out x86 platform using NVMe Caching for my virtualized workloads. In fact, what is stopping me from also having the same storage software running in the public cloud to offer me instant DR at any point?!


The same can also be said for my fabric connectivity, I could use NVMe over FC for my storage arrays and ROCeV2 for my Hyper-converged stack, but if these are flat topologies then the intelligence is sitting in software you are not tied to any vendor.


By opening our eyes to the right Software Defined strategy, it will enable your business to ultimately be flexible enough to make use of what technology is available and not lock you into any hardware solution, form factor or standard.



In Summary


We can see how a strong software defined strategy can prevent pain further down the line, but does this stop us making a technology decision today? What do you do right NOW…?


While standards are being defined for new technology we can make use of NVMe caching today to accelerate our virtualized workloads without any-uplift. The perfect option would be Hitachi Vantara UCP solutions, as this allows you to scale your application performance and agility without locking you into one technology. You can get more information on these solutions here.


This gives your business the breathing room to deliver a killer software defined strategy that will enable real digital transformation. Best part… the UCP platform is already software defined, so you have already started your journey!


Keep moving forward!






Read more in this series:

NVMe 101 – What’s Coming to the World of Flash?

Is NVMe Killing Shared Storage?