Skip navigation
1 2 3 4 Previous Next

Storage Systems

48 posts

Lightning fast flash with the loving hug of data protection

 

When we have seen what Flash has done to change our businesses in such a short space of time, it is no wonder all the talk around high performance and data reduction takes all the air time. After all it’s an exciting proposition! What becomes somewhat diluted in the conversation is the need for a modern approach to data protection for business workflows. In recent times we have seen regulation change, with the likes of the General Data Protection Regulation (GDPR), that cannot be ignored by customers. I mean with a penalty of 4% of annual turnover this is not to be sniffed at.

 

Hitachi Vantara has an exciting and modern approach to data protection, please see this interesting read from Rich Vining with his blog on Overcoming the risk that redundant personal data bring under GDPR. What additional innovations can we bring to the table, especially to help our large customers who have mission critical applications on our solutions?

 

With that I am joined by Ken Kawada who is responsible for our Virtual Storage Platform (VSP) G/F1500 Enterprise storage solutions to talk about our new Flash Moduel Drive (FMD).

 

ken.jpg

 

Ian: Thanks for Joining me Ken! So could you give me some more insights into the data protection innovations you are bringing to market for the VSP G/F1500?

 

Ken: No problem Ian. Sure thing, as you know our engineering team in Japan has over 350 unique flash patents for our FMD technology. Today, July 16th2018 we GA the new FMD with embedded encryption for data at rest. This allows customers to offload data compression but now also encryption to the microprocessors on the FMD HDE, freeing up the storage controllers and having no impact on performance. This allows customers to turn on compression and encryption and just forget about it!

 

 

Ian: So what exactly does it mean for an FMD to have embedded-encryption?

 

Ken: Think of FMD HDE as a bank vault. When you write blocks of data the array it stores it like depositing money into a bank. The bank vault is completely impenetrable and the only way for money to get in or out is through the vault door. No matter how secure it is, the vault door is rendered totally useless unless someone remembers to lock it. The controller on the FMD HDE is arguably the most critical component, it acts as the key master, without it and the key the array cannot open the bank vault door. Authentication is like locking and unlocking the vault door.

 

 

Ian: Sounds rather snazzy, but would customers find it easier to encrypt further up the stack at the application layer rather than in a storage solution?

 

Ken: That is always an option for some, the risk for true business critical workloads is adding unwanted latency into an application at the top of the stack. The beauty of the FMD HDE is that customers can leave this to us, we are offering a FIPS ready solution with support for multiple KMIP 3rdparty tools and vendors. At the same time still delivering customers 2:1 data compression savings that they are used to on FMD with no performance impact. How many other vendors can offer 4.8M IOPS without any impact on compression and encryption?

 

 

Ian: Talk to me more about that, what do I need to support these new wonder drives?

 

Ken: From our analysis of our customer base 34% of all FMD capacity shipped on our VSP G1000 and G/F1500 are sitting behind encrypted BED controllers, so high end customers are our priority. This coincides with the launch of SVOS RF for the G/F1500 which allows full use of the features of the FMD HDE. Customers have a choice to use ourselves for key management or utilize a 3rdparty vendor, so they are not locked into a vendor. Customers just need a one time frame license to support the FMD HDE on their arrays, and that’s it!

 

 

Ian: The 3rdParty key management sounds like it gives customers a lot of choice. What vendors do we support for the FMD HDE?

 

Ken: The idea is to really give customers choice, most of our customers may already be investing in a key management solution so we wanted to make sure we could integrate with those vendors and make our customers lives easier. Obviously customers can choose Hitachi, but we also support Gemalto SafeNet, Thales keyAuthority, HPE Enterprise Secure Key Manager and also IBM. Customer may also be using encrypted SSDs or applying encryption on FMD using the back end controllers. These will happily coexist together in a solution so customers do not need to throwaway their investment.

 

 

Ian: So what sort of investment are customers going to have to make to support this technology?

 

Ken: That’s a good question, we have worked hard to make sure there is not a large cost penalty for customers wanting to adopt this technology.  Today there are 2 costs associated to the FMD HDE solution; first is the price of the drive itself, which is comparable to the regular FMD HD drives. Second is a one off license for the array itself to enable the technology. This is not based on capacity, so customers can grow without additional licensing costs to use encryption.

 

 

Ian: So let’s close out, what other technological advancements can we expect to see in our storage platforms in the coming 12 months?

 

Ken: Hahaa no comment, but it starts with N and ends with VMe!

 

sandles.jpg

 

Of course there was more info… every time we have an interview there is stuff that just doesn’t make it. In this case we dived deep into the Virtual Storage Platform (VSP) world, full socks and sandals spec, even for me!

 

The key take away for all Hitachi customers from this blog has to be the ongoing investment in storage technology, especially FMD. I am always excited to see new features and functions rolling out of engineering, especially when they are so focused on helping our customers.

 

I have to extend a big thanks to Ken for his time today and great insight into the ongoing investment into flash storage technology from Hitachi for our customers. Dilly Dilly my friend!

 

dilly.jpg

Keep moving forward!

 

Cheers,

Bear

4 on floor.jpg

 

Recently Hitachi Vantara introduced an upgraded line of VSP all flash and hybrid flash arrays, but that’s not all.  With this introduction came the release of the latest version of our Storage Virtualization Operating System affectionately known as SVOS.

 

As a longtime product marketer in the technology world, one of the most daunting tasks that one can face is product naming and branding.  In the case of SVOS, I set out about it by asking Hitachi Vantara customers what their experience was with SVOS.  Time and time again I would receive the answer, “Love it, because it just works”.  Now although I appreciated this compliment, truthfully it was not helping me with the task at hand.  I didn’t think customers would be wowed by the messaging, “it just works”.  As I struggled on trying to find the perfect designation for the latest version of SVOS, the “it just works” sentiment reared its head again in a recent TechValidate survey. The results of the survey showed that customers who chose Hitachi All-Flash arrays did so because of Hitachi’s reputation for enterprise reliability.  At that moment it became clear to me that it wasn’t I that needed to name the latest version of SVOS, it was Hitachi Vantara customers!  Like when Federal Express customers essentially re-branded the company to FEDEX, I started down the path of using customer testimonial to name the new SVOS. While on that road, I was told by customers that they value SVOS and VSP all-flash array solutions highest due to the mission critical application availability and resilience that they provide. The resulting name was SVOS RF!  Why? Because customers tell us that SVOS RF is the premier “Resilient Flash” operating system and they depend on it for delivering the highest application availability.

 

4 Reasons Why is SVOS RF different from competitive flash operating systems:

  1. Enterprise- Class uptime and Zero RPO/RTO - 3 Data center redundancy when teamed with Hitachi Global Active Device. Support by an industry leading 100% Availability Guarantee
  2. “Flash Aware IO Stack” delivers End-to-End Quality of Service – Industry leading quality of service is enabled by preserving share of channel and NAND chip resources for application workload should and contention occur.  The benefit is that SVOS RF QoS intelligence dynamically allocates more IOPS to your applications during peak times by shifting those cycles from other operations (ie replication). Application availability and performance come first with SVOS RF.

  3. SVOS RF Virtualizes 3rd party arrays – Yes, you heard that right! In fact, Global 100 companies use SVOS RF’s 3rd party virtualization capability and place Dell/EMC, NetApp, and a host of 3rd party arrays behind the Hitachi VSP F series. This simplifies management and delivers Hitachi world class data availability to the virtualized storage pool. Why do they do it?  Because Hitachi leads in application availability and we spread the love by allowing our customers to recoup their investment in competitive arrays by virtualizing them behind the Hitachi Vantara VSP series.  Think of it as virtualizations take on “Follow the Leader”. This unique capability is made possible by SVOS RF, and customer’s even use it to migrate data to and from heterogeneous arrays.

  4. Mainframe class 24/7 Application Availability -  Ok, I mentioned the “M” word…. Go ahead and laugh but the fact of the matter is that mainframes are still “King” in the most mission critical financial environments.  SVOS RF enables the Hitachi Vantara VSP series to deliver all-flash storage goodness to these behemoths of the datacenter with the bulletproof availability required for mainframe storage.  Hitachi Vantara is a leader in mainframe flash storage and we are proud to service this high-end, uncompromising segment.

 

Be Different, Be Better

So, there’s a sampling of why Hitachi Vantara customers claim with enthusiasm that “Hitachi VSP Flash Storage just works”.  That’s something special in today’s world of over information, hype, and “Fake IT news”. None of this surprise us here at Hitachi Vantara because our heritage is a company that designs medical imaging equipment, water treatment, heavy industries, bullet trains, and power plants.  Yes, that stuff must work because it is life critical… no excuses.   Hitachi Vantara is different from any other IT infrastructure provider because we have operational technology in our DNA. So, if you need a nuclear power plant, wind turbine or photovoltaic power generation system, please call us.  If you need an industry, leading all-flash array, please call us for that too.  We assure you that you’ll get a solution, “That Just Works”  : )

 

See More Great Content on Data Center Modernization!

 

Hu Yoshida - Transforming Data Center Focus from Infrastructure to Information

Nathan Moffitt – Data Center Modernization

Richard Jew - AI Operations for Modern Data Centers

Rich Vining - Data Center Modernization?  Include Modern Data Protection

Summer Matheson - Bundles are Better

Neil Salamack - Facts About Data Reduction

Paula Phipps - The Super Powers of DevOps to Transform Businesses

I remember coming home after elementary school each day, turning on the TV (of course!), and getting my recommended daily allowance of Sesame Street. The theme music was calming, and so was the familiar cast of characters I loved – Bert and Ernie, Big Bird, and of course Kermit.  Many of you may share these great memories…

 

So, what does Sesame Street have to do with an Oracle Enterprise Data Warehouse, and the critical data that is the backbone of any organization? Read on my friend.

 

With the help of Count Von Count, I’d like to share with you the top four fun facts about your Oracle Enterprise Data Warehouse (EDW).

 

  1. As data volume increases, inefficiencies in your enterprise data warehouse (EDW) can prevent you from realizing the full value of your data.
  2. In a typical EDW, 50-70% data is cold or unused - resulting in increased query and back-up times.
  3. Extract-transform-load (ETL) processes consume more compute and storage capacity, leading to higher licensing and management costs.
  4. You can optimize your Oracle EDW by offloading cold data to a more cost effective, NoSQL database.

 

With Hitachi Solution for Databases – Optimized Enterprise Data Warehouse, we enable you utilize a fully tested and certified solution that offloads cold and unused data to a data lake, based on MongoDB appliance cluster. With this approach, you can reduce costs, deliver faster access to data, and provide better information for decision-making.

 

 

We can automatically map data between the Oracle database and MongoDB, speeding the offload operation and removing manual processes by up to 90%. Simple and easy.

 

Sound interesting?   These reference architectures and solution profile will provide you with more in-depth information.

 

Hitachi Solution for Databases in an Enterprise Data Warehouse Offload Package for Oracle Database Reference Architecture

Hitachi Solution for Enterprise Data Intelligence with MongoDB Reference Architecture Guide

Solution Profile: Hitachi Solution for Databases – Optimized Enterprise Data Warehouse

agility.png

Dear IT Executives,

 

These days you are being asked to do more and more than ever before.  Bring on more applications, faster.  Add more capacity to existing applications.  Deliver more performance. Do more analysis of data to unlock insights. Store more data types—block, file and object—that are being generated from more and more sources.  And make that data accessible to more users who are running more applications on more devices. And don’t forget to think about the need for more security, more risk avoidance and more regulations to comply with. Oh, and data center budgets aren’t nearly keeping up with this rate of change.  So, when it comes to spending, that’s where the “Mores” end.

 

The only way to meet these challenges is to modernize your data center.  Unless your business is static, there’s no way that older architectures can enable you keep up with the rate of change that’s coming your way.  old-new3.jpgSaid another way, digital transformation initiatives will require new and faster systems as well as newly designed processes.  Just storing and retrieving data, which is what older architectures were good at doing, is no longer enough.

That’s because digital transformation puts data at the center of your business which means that data must be dealt with in brand new ways.

 

A key element of data center modernization is an agile data infrastructure.  By infrastructure agility I mean the ability to move fast regardless of the obstacles that are keeping your from adopting advanced technologies and processes getting in your way. Of course, your company’s services need to run fast and with non-stop availability, but you need to have workload optimized architectures in place to store your data. 

holestic.png

Because a one-size fits all approach won’t meet the requirements, a use-case optimized data center includes powerful AFAs or hybrid arrays for transactional processing, converged or hyperconverged solutions for cloud-delivered services and object storage for longer-term retention of data for both compliance and data lakes.  However, these architectures should never operate in silos.  You need the ability to easily move data to the system where it optimally resides and set policies in an integrated fashion.

 

Fortunately, at Hitachi Vantara we’ve announced a major expansion of our agile data infrastructure portfolio with a new Virtual Storage Platform (VSP) all-flash F series and hybrid flash G series lineup.  From midrange to enterprise, the eight new models that we introduced are designed for modernized data centers.  Their benefits include:

  • MORE IOPS/$: Get up to 3X the IOPS performance as compared to the prior gen models and at better pricing for the same configurations.  In addition, latencies have been reduced by 25% for faster response times that are well below 1 millisecond, even when systems are pushed to their full performance capabilities.
  • PEACE OF MIND FOR ALL: Hitachi offers the only 100% Data Availability Guarantee in the industry.  All of the new models carry this guarantee.
  • GREATER VALUE: Each of the new models comes with the Foundation software package which is included in the price.  Foundation is rich with features including: our newest operating system--SVOS RF; a Universal Volume Manager license for virtualizing external storage; ShadowImage, Thin Image and Hitachi Data Instance Director for data protection; Dynamic Tiering for data mobility; and Hitachi Infrastructure Analytics Advisor for powerful analytics.
  • IMPROVED CONSOLIDATION: Massive capacity which can scale up to 17.4PB of raw flash capacity.
  • PREDICTABLE ONGOING COSTS: All of the new G and F series models will have flat service pricing for up to 7 years as part of our Flash Assurance Program.  The price of a support contract won’t increase after the warranty period expires.
  • SELECTABLE DATA REDUCTION SERVICES: We offer the unique feature of allowing data reduction services to be selectable at the LUN, or volume, level which allows our users to optimize their systems between max performance and max data reduction.
  • GUARANTEED CAPACITY EFFICIENCY: Reduce the amount of flash capacity you need by up to 75% (4:1) or we’ll give you additional capacity for free.
  • AI OPERATIONS: All new systems come packaged with Hitachi Infrastructure Analytics Advisor (HIAA) which has been significantly upgraded to provide prescriptive resolutions to data center problems based on machine learning for faster time to resolution with optimized outcomes.  It can even predict future data center resource requirements based on sophisticated trending analysis.

 

The newly refreshed VSP model line-up can be combined with our UCP converged and hyperconverged solutions and our HCP cloud object platform to create the agility you will need within your data center.

It’s also very critical that you have a long-term transition plan in place that gets you to a modernized data center.  Without a solid plan, you risk wasting investments in solutions that don’t take you in the best direction.  It’s important that you work with a partner who has the expertise in their professional services practices on how to execute a navigate for the future.

Customers.png

We’ve partnered with many companies like yours who have successfully undertaken some very challenging transformation projects. CPFL, a Brazilian electrical utility, is using our technology to support their implementation of smart grids to better match the generation and delivery of electricity with its demand. Deluxe transitioned their primary revenue stream from the production of printed checks to providing analytics services around business transactions.  And Meat and Livestock of Australia turned their stock into an Internet of Things that generate data which is stored and analyzed on our VSPs. Our technology and professional expertise are trusted by 9 out of 10 of the world’s largest banks and telecommunications companies for modernizing virtual data centers in today’s multi-cloud world.

 

We’d love to partner with you as well, so you can be successful in providing a better experience for your customers, creating new data-driven services that lead to additional revenue streams, and optimizing your operations for cost reduction.  We know we can help your company improve its customer satisfaction levels and make you a star in the eyes of your fellow executives.  Let us help you.

 

Best,

Mark Adams

Hitachi Vantara

 

P.S. You can read more great blogs in this series by:

 

Hu Yoshida - Transforming Data Center Focus from Infrastructure to Information

Nathan Moffitt – Data Center Modernization: It's More Than Happy Little Trees

Nathan Moffitt -  Roadmap to an Autonomous Data Center

Nathan Moffitt - AI Operations: Designing a Low Touch Infrastructure

Richard Jew - AI Operations for Modern Data Centers

Angela MaGill -  Program your SAN with next-gen automation. Yes, really!

Rich Vining - Data Center Modernization?  Include Modern Data Protection

Summer Matheson - Bundles are Better

Neil Salamack -  Tell Me the Truth... Does My Data Look Fat?

Neil Salamack -  Containers, OpenStack and the Brave New World of Dev Ops

Neil Salamack -  4 Reasons Why SVOS RF Sets The Bar for Flash Availability

Paula Phipps - The Super Powers of DevOps to Transform Businesses

Reduce Risk, Improve Efficiency and Prepare for the Future with AI Operations

 

I like to write. Especially on topics I'm passionate about. Normally that means I write a page or 2 on a topic. Today though I'm trying something different. I'm digging in to provide a deeper look at how you can build a roadmap for an autonomous data center.

 

I'd say a plan for an autonomous data center, but the fact is that we're just at the forefront of seeing solutions that enable autonomy. And unfortunately, most vendors are designing software that is too 'vendor specific' and narrow in scope. To get where we want to be, software offerings need to work together, integrating insight and action so the data center can manage itself. Only this way will staff be truly free to focus on innovation.

 

 

Hitachi is pushing to make this happen faster. Pushing outside our normal comfort zone and looking at ways to accelerate change. But there's a lot to do. Hence, the length of this post and the need for an infographic (Hey, I used PowerPoint, don't hassle me. ). So read on!

 

roadmap.png

 

 

Your Data Center. Simple in Silos.

 

When individual applications or infrastructure components are deployed, things often seem simple. Resources are delivered, monitoring is put in place and everything looks good. At a siloed, 'project level' this is true.

 

When you pull the lens back though and look at the data center as a whole, you see groups of systems, networks and software working together / sharing resources to perform tasks. You see a living organism where an issue in one area can cause ripples in the data center fabric that impact uptime, performance, resource utilization and ultimately the customer experience as well as your budget or regulatory compliance.

 

AI Operations Enable an Autonomous Data Center

 

To increase confidence that operations will run smoothly, AI Operations software is needed. AI operation software collects analytics from across your data center to predict and prescribe adjustments that help the entire data center run more efficiently. It can (should) also automate processes to accelerate action so that you begin delivery of an autonomous data center.

 

But where do you start and how do you approach implementing AI operations to govern the systems, software and services that make up your data center? Feedback from our customers tells us that there are a few steps to consider:

 

  • Step 0: Set Near and Long Term Scope
  • Step 1: Automate Deployments
  • Step 2: Implement Data Center Analytics
  • Step 3: Combine Analytics with Automation
  • Step 4: Extend the Framework
  • Step 5: Enable Tactical and Strategic Autonomy

 

Note: For every stage it is important to note that AI based analytics are only as good as the data received. Special emphasis should be made on quality, granularity and length of data analyzed for accuracy.

 

Step 0: Set Near and Long Term Scope

 

Before purchasing and implementing AI operations software for a data center, it is important to define what you want from the solution – near and long term. This should include definition of:

 

  • Data center devices – systems, software and services – that AI will manage
  • Operations you will allow AI to handle autonomously
  • Operations you will allow AI to handled semi-autonomously
  • Data center devices you will want AI to manage long term

 

The last point is especially important because while AI offerings are rapidly evolving, their scope of coverage is still relatively narrow. Many offerings are vendor-specific with limited ‘line of sight’ to how their actions could affect other systems – positively or negatively.

 

To minimize the potential for “silos of AI operations” that interfere with each other, define a clear scope of what will be controlled, and how other systems will be affected if AI acts autonomously. It is also important to understand how devices will be added over time.

 

Note: API-driven offerings can help smooth the expansion of AI across your data center by providing a common interface for communication, in particular when existing management practices and processes must be integrated. This can enable long term agility and enable creation of a collective AI that leverages the expanded set of analytics to make increasingly smarter decisions.

 

Step 1: Automate Deployments

 

Perhaps the best first step to successfully implement an autonomous data center, is ensuring best practices and associated polices are followed. When resources are deployed according to best practices, their behavior is predictable and the need for AI to identify complex or unseen issues is minimized.

 

Best practices alone though are not enough, especially if numerous configuration tasks must be executed during deployment. To prevent accidental errors like a step being skipped or followed improperly, best practice processes must be automated. Automation software helps ensure the successful delivery of systems, software and related services like data protection by implementing:

 

  • A predefined catalog of best practices for deploying systems and software
  • Customizable best practices to support your specific data center resources, service level objectives and data management policies.
  • An engine to automatically implement the catalog with minimal human interaction

 

These features enable your staff to provision and manage data center resources, greatly reducing the risk of downtime, data loss and sub-optimal performance. They also free your experts to focus on driving the business forward, not troubleshooting deployments.

 

AI CONSIDERATION: Automation engines can be designed to do more than follow a guided set of steps. AI can look at available resources and determine which are under-utilized or will provide the best ‘experience,’ increasing ROI. If an automation AI understands the data path and workloads, it can help prevent issues that impact application stability and end user experiences.

 

You should also consider how automated actions will be tracked. This way you have a history of events & actions that were performed for ongoing analysis. Integration with ITSM tools is helpful here.

 

Step 2: Implement Data Center Analytics

 

Once resources are deployed, it is important to make sure they continue to perform as expected – individually and as part of a whole ecosystem. If environments are not regularly tuned as a complete system, they will never deliver a maximum performance and stability. Only through ongoing monitoring and optimization can you prevent systems from degrading over time and impacting broader data center operations.

 

To keep operations running smoothly, data center analytics software incorporates AI and ML (machine learning) that looks across your environment to determine what is happening – or has happened – and what to do next. This includes:

 

  • Ecosystem Optimization
  • Budget Forecasting
  • Fault Prediction and Identification
  • Anomaly Detection
  • Root Cause Analysis and Prescribed Resolution

 

It is important to note that many analytic offerings are product, not data center focused. This limits their ability to accurately forecast needs and identify fault resolution. To achieve the best possible outcomes, dependencies along the data path must be considered before making a decision. 

 

AI CONSIDERATION: Where and when AI decision making occurs is important. If it happens offsite, make sure your organization allows external transmission of system information. If data is only collected every few hours, understand how that will affect speed and quality of analysis.

 

Step 3: Combine Analytics with Automation

 

Analytics deliver powerful insights into how operations are performing and what changes should be made to improve / repair the environment. But if analytics only inform or prescribe changes, you are still responsible for executing prescribed actions.

 

This may be appropriate for some actions, e.g. issuing a purchase order for more capacity, but in other instances it can delay issue resolution and create risk. As noted in Step 1, automation is critical to minimizing the potential for accidental errors. By linking automation with analytics, data center teams can significantly reduce the time to implement changes and assure adherence with best practices. For instance:

 

  • Real Time Configuration Adjustment: E.g. Analytics AI identifies a performance issue and prescribes a change to QoS levels. It then executes update via the automation engine.
  • Service Execution: E.g. Analytics AI identifies a data protection policy has not executed a snapshot recently. It then executes automatically runs the snapshot service.

 

Some organizations may decide to start with a solution that combines insights and action. Others, may implement these functions in discrete stages. In the latter case, it is critical to identify vendors that can offer upgrades or product / vendor integrations to combine offerings.

 

AI CONSIDERATION: Analytics and automation offerings can each have their own AI functions. In most offerings the analytics serve as the ‘brain’ and automation is the ‘engine,’ but it is still important to understand if they can work together to make smarter decisions. Over time, analytics and automation will likely become more tightly coupled to improve efficiency.

 

Stage 4: Extend the Framework

 

For many, the journey to an autonomous data center will likely pause after Step 3. This allows teams to review predictive and prescriptive analytics to improve best practices as well as expand the scope of actions that are automated.

 

After this is accomplished, it is time to identify areas where the framework can be extended. There are multiple paths forward that organizations may consider including:

 

  • Deeper data path integration: E.g. Integrating application analytics to measure the impact of latency on transactions and use that information to more precisely define QoS levels or forecast when resources will be needed to meet performance SLAs.
  • Broader Service Management Control: E.g. Integration an infrastructure automation engine with an ITSM platform to enable better control over deployment and management of data center components for a more robust service management experience.
  • Facilities Analytics: E.g. blending additional data sets like power, cooling analytics for making better decisions around energy & operations management.

 

How this stage is implemented will vary significantly based on organizational needs and may require professional services work depending on the outcomes desired. It will be worthwhile though as the learnings here lay the ground work for Step 5.

 

AI CONSIDERATION: A key factor in any AI implementation is establishing how and when AI will interact with human counterparts. During initial deployments, and especially as the framework is extended, it may be desirable to have machine-to-human communications occur before any action is taken. Over time though, as comfort levels increase, you may decide to allow greater AI autonomy and only receive notifications that actions have been taken.

 

Step 5: Enable Tactical and Strategic Autonomy

 

Up to this point we have focused on an overarching AI to manage data center operations. This focus is based on the idea that most systems and software may be able to govern themselves, but they do not have the ability to collaborate with other systems to make decisions.

 

Long term though this will change. Over the next several years we will see increasing levels of intelligence in the systems that make up data centers. At that point we will want to turn over certain tactical, AI operations to sub-sets of systems in the data center.

 

For instance, applications may work with network and storage devices to determine the best path or location to route data and work around faults. Or applications may predict upcoming job types and, based on associated SLAs, request migration of data sets to higher performance storage. 

 

It will still make sense to have an overarching AI analyze and execute strategic operational decisions, but tactically it is important to allow sub-sets of devices to make real-time decisions about how they work together to achieve discreet goals and overcome local obstacles.

 

Ultimately, Step 5 is about taking the concepts of initial stages and applying them to groups of devices that must work together. As before, this will require an API-driven interface for devices to communicate, a shared language and hierarchy of leadership for making joint decisions. How this plays out is still being defined and will likely expand Step 5 into multiple stages. Until then, the best thing to do is look for offerings that have a roadmap for device-to-device communications.

 

 

So there you have it. I hope you found this informative. This is a topic I've been waiting months to blog on, and am excited to discuss. If you have questions or comments, let me know. Happy to discuss and always open to ideas on how we expand the conversation.

New World.jpg

According to Wiki, a “Creator” is, “Something or someone who brings something into being”. These days with all the digital technology and tools, it’s never been a better time to be a Creator.  With businesses working to be the next disrupter, there have never been more tools available to accelerate innovation. Every so often a “process” change is so significant that it becomes much more than that, it becomes a broad sweeping movement or even a culture. That is precisely what is happening with DevOps. For those who don’t know DevOps, at it’s core it enables business to accelerate the development and release of products by applying agile and lean principles to software development.  The rate of business change is main driver for DevOps and that’s why its adoption has spread like wildfire. The movement started with small companies and is now driving process and cultural change in large enterprises.  But it’s not all about DevOps per se, the tools available to software developers and IT engineering enable DevOps to develop, maintain, and release new products and features at a dizzying rate. Let’s discuss some of the most significant tools that are literally changing how software development is done and how IT environments are deployed.

 

DevOps benefits include:

  • Overall accelerated business innovation
  • Deploy with a process that can scale, and is both repeatable and reliable
  • Integrated process for capturing and measuring data points on quality
  • Built-in propagation of process benefits
  • Ownership of both development and operations which extends to the production environment and customer experience

 

There’s no “Me” in “TEAM”…

Uh… well there may be “me” in “team”, but there’s also an “us” in DevOps : )  Remember its not just a process, it’s a culture! And its never been easier to be part of the “US”, this is because everyone can work on their own code or module and check-in at any time.  Consequently, the DevOps model is flexible enabling updates and enhancements at any point in this perpetual process.  Think of the development team as a collective driving higher synergy through interaction.

 

Tools of the Trade - Containers Enabling DevOps

Toolstrade.jpg


There is a huge amount of hype around “Containers”, heralded as the virtual machine of the new millennia. There’s good reason for the excitement as Containers offer several huge benefits, like the ability to run the application in a lightweight “container” totally agnostic of the underlying OS.  But that’s not all.

 

Containers are super tools for DevOPs for the following reasons:

  • Velocity – Containers spin up faster (Average of 500ms compared to 2-7 minutes for a virtual machine) enabling the developer to test and retest quickly
  • Easy - Simplified, quicker code integration of multiple modules
  • Accelerated delivery pipeline and code release process
  • Develop one version, run it on anything, anywhere

 

OpenStack and DevOps
Transparency, sharing, and open exchange is embedded into the DevOps culture and process. It’s no surprise that OpenStack is finally finding its place within DevOps.  OpenStack is maturing and has developed a fast-growing ecosystem. It enables the DevOps community by providing an open, affordable and stable platform upon which to deploy “containerized” apps. OpenStack offers robust data services and leverages open API’s for simple standardized application development. The OpenStack block storage service is called, “Cinder” and a growing number of enterprises are adopting it as part of their DevOps repertoire.

So as this is a “storage” related blog, you may be asking how enterprise storage play’s in to support containers, OpenStack and the overall DevOps initiative.  Well, here’s where the rubber meets the road.

 

What about Storage for Containers?
Good question.  Container are light and agile.  Agile meaning that they can be spun up and deleted so quickly that they have a disposable, “ephemera-like” quality.  This is fine for cloud native applications delivering search results and then deleting the container, but what about traditional applications like databases which require “persistent” storage?  Luckily storage vendors like Hitachi Vantara offer plugins to their storage OS to enable a persistent connection between the container and the storage. The Hitachi Storage Plug-in for Containers provides connectivity between Docker containers and Hitachi VSP storage platforms. With the plug-in, Hitachi customers can deliver shared storage for Docker containers that persists beyond the timeline of a single Docker host. This is essential for enabling DevOps and agile IT. It also makes stateful application development available for container-based workloads. With a persistent connection to the storage, containers can be protected with high availability, replication, and snapshots. As containerized apps find themselves into mission critical applications, enterprise-class data protection capabilities will be required.

 

What about Storage for OpenStack?

As mentioned earlier, OpenStack Cinder provides a REST API to exchange block data to the storage. Leading storage providers like Hitachi Vantara offer a driver like the Hitachi Block Storage Driver Plugin to enable enterprises to leverage their existing storage for OpenStack. The benefits are similar to the container driver in that it opens up rich storage services for OpenStack based applications.

 

The Takeaway – Roll with the Changes

So, keep DevOps on your radar, in fact you may want to get in front of the wave by initializing some cultural and process changes before your competitor does.  Luckily, Hitachi Vantara is here to help you leverage storage solutions to support DevOps and help you win the game.  So go ahead, make some DevOps noise and disrupt your competition.

 

Learn About DevOps

Paula Phipps’ Blog - The Super Powers of DevOps to Transform Business

 

More great blogs in the Data Center Modernization Series here:

Hu Yoshida's blog - Data Center Modernization - Transforming Data Center Focus from Infrastructure to Information

Nathan Muffin's blog – Data Center Modernization

Mark Adams's blog - Infrastructure Agility for Data Center Modernization

Summer Matheson's blog - Bundles are Better

Richard Jew's blog - AI Operations for the Modern Data Center

FACTS ABOUT DATA REDUCTION

We’ve all heard the cliché, “If it sounds too good to be true, then it’s probably not true”.  In many situations this turns out to be the case, but I’m not writing this blog to throw shade at data reduction.  Actually, data reduction is an amalgamation of super useful technologies like compression and deduplication which yield additional “effective” storage capacity out of your existing usable capacity. This proposition is especially attractive when it comes to all-flash storage because data reduction can significantly lower your cost per GB.  But there are questions you should be asking and that’s what I want to call out.  As a starter, it is important to understand the definitions used in data reduction marketing jargon so here’s a primer.

  • Data Reduction – Compression and deduplication technologies
  • Total Efficiency – Compression, deduplication, snapshots, thin provisioning
  • Raw Capacity – Total disk space available to the array
  • Physical Capacity – Capacity available after formatting the media
  • Usable Capacity - Physical capacity available after factoring RAID data protection overhead and spare drive capacity
  • Effective Capacity – Useable capacity available after deduplication and compression is applied
  • Free Capacity - Unassigned space in a volume group or disk pool that can be used to create volumes

 

So you may be thinking that if compression and deduplication is the basis for data reduction, is one vendors compression and deduplication better than another’s? Flash vendors use very similar compression and deduplication technology.  For example, all vendors use the same LZ77 bit compression algorithm as this technology is proven and patented. Deduplication schemes don’t vary much as the premise is the same, to identify patterns and eliminate duplicate copies of data.  A pointer to the original data is inserted where a duplicate exists.  Sooo… this leads to the question, “Won’t all vendors results be the same if they all use the same compression and deduplication technology for data reduction?  The answer is yes, if not quite the same, the results yielded will be very similar.

 

4 key considerations when evaluating data reduction:

 

24:1 is the most baby!  But this “bug” isn’t going to win any races…

 

The first is performance or overall system IOPS.  If you have already settled on an all-flash solution there’s a good chance that the reason why was to deliver better performance and quality of service to your customers and the business.  Regardless of what vendors claim, compression and deduplication can adversely affect system performance because those data reduction operations need to be handled either in silicone or in software.  This is called the “data-reduction tax”.  Now how big a “tax” you pay is directly related to how efficiently the vendor has implemented the solution. Moreover, everyone’s environment and use case is different so results can vary widely regardless of vendor claims and guarantees.

 

The second big ticket item to be consider is the type of data are you going to be reducing.  The fact of the matter is that some data types compress very well and other don’t at all. For example, remember the first time you tried to “zip” a PowerPoint presentation because the darn email system wouldn’t allow attachments over 10MB.  After swearing and cussing at Outlook you thought that zipping that big honkin presentation would be the answer.  Then you learned that zipping that .ppt got you nothing!  That was a rookie move.   What you should know is that that databases and VDI data compress very well.   Audio, video, and encrypted data doesn’t compress well at all so there would be very little data reduction benefit on those data types.  The point I am making is that you should be aware that your data reduction benefit is going to vary directly based on your data type.

 

The third item I’m going to ask you to consider is does your vendor give you a choice to configure your storage volumes with or without data reduction?  If the answer is “no”, you should be concerned and here’s why.  In a “data reduction is always on” scenario you don’t have the choice or ability to balance data reduction with performance. This may be fine if your application can tolerate the latency inherent in a “data reduction always on scenario”, but in most cases all-flash arrays are purchased to break performance barriers, not introduce them. I must point out that with the new Hitachi VSP Series and Storage Virtualization Operating System RF, the user has a choice to balance flash performance and efficiency right down to the LUN Level.  The result is a bespoke balance of IOPS and data efficiency tuned perfectly for each individual environment.

 

Four is probably the most important attribute that few vendors are willing to discuss.  Is the data reduction guarantee or claim backed up by an availability guarantee? Do you know that “availability” is the number one selection criteria when purchasing and all-flash array?  What is good is a 7:1 efficiency ratio if you can’t get to the data?  Hitachi Vantara stood out from the crowd by being the first to offer a 100% Data Availability Guarantee with their Total Efficiency Guarantee.

 

So in closing here’s what I should suggest when evaluating data reduction claims?  Don’t be fooled by the “my number is bigger than your number claims”.  The results that you will see are highly dependent on your data and workload.  Work with the vendor to assess your environment with a sizing tool that provides a realistic expectation of results. Consider that you may not want to run compression and deduplication on certain workloads to maximize performance. You will want the choice to turn data reduction on or off on different volumes within the same array. Also, beware of any vendor that promises you maximum flash performance with the highest data reduction ratios because if it sounds too good to be true it probably is.

 

For More Info on Hitachi Vantara Investment Protection and Total Data Efficiency:

Hitachi Vantara Up to 4:1 Total Efficiency Guarantee

Hitachi Vantara 100% Data Availability Guarantee

Hitachi Vantara Flash Assurance Program

 

 

You can read more great blogs in the Data Center Modernization Series here:

Hu Yoshida's blog - Data Center Modernization - Transforming Data Center Focus from Infrastructure to Information

Nathan Moffitt's blog – Data Center Modernization

Mark Adams's blog - Infrastructure Agility for Data Center Modernization

Summer Matheson's blog - Bundles are Better

Paula Phipps' blog - The Super Powers of DevOps to Transform Business

Richard Jew's blog - AI Operations for the Modern Data Center

Enterprises make copies of the critical data sets for assorted reasons, such as: a copy for backup and fast, local recovery; a copy in one or two other locations for business continuity and disaster recovery; copies for the test and development teams; copies for finances and legal; and so on.

 

If these copies aren’t automated, controlled and secure, they can become costly and a serious liability.

 

Let’s start with the basics and walk through an example of how Hitachi Vantara, through the use of Hitachi Data Instance Director (HDID) and the range of technologies that it orchestrates, can help organizations automatically create, refresh and expire copy data.

 

Our main data center is in New York. In it, we have a production application server, let’s say it’s an Oracle database environment. The Oracle data is stored on enterprise-class storage – in this case an Hitachi Virtual Storage Platform(VSP) F-series all-flash array.

 

Now we need to make a periodic copy of the data for local backup and recovery. The old method of taking an incremental backup each night and a full backup on the weekend doesn’t work anymore. They take too long; often many hours to complete a backup. And they leave too much data at risk; a nightly backup means a recovery point objective (RPO) of 24 hours, which means as much as a full day’s worth of data is at risk of loss. Neither of these are acceptable service levels for critical applications and data.

 

So instead, we’ll take an hourly application-consistent snapshot using Hitachi Thin Image, which is part of the storage system’s Storage Virtualization Operating System (SVOS). The snapshot can be created as frequently as needed, but once an hour already improves your RPO and reduces the amount of data at risk by more than 95%.

 

Next, we also have a data center in Boston, so we set up replication to another VSP there to enable business continuity. Since the latency between the sites is low, we can use active-passive synchronous replication (Hitachi TrueCopy), guaranteeing zero data loss. Or, we can support an active-active configuration to enable always-on operations, using the VSP’s global-active device storage clustering feature.

 

We can also have a 3rd site, let’s say London, connected by asynchronous replication, using Hitachi Universal Replicator, to protect against a major regional outage such as the power blackout that impacted the northeast corner of the United States in 2003. Most areas did not get power restored for more than 2 days, and those businesses that did not have a disaster recovery site outside of the impact zone were severely affected.

 

Flexible 3 data center topologies are supported, including cascade and multi-target. An additional feature called delta-resync keeps the 3rd site current even when one of the two synchronized sites goes off-line.

 

Now that our data is completely protected from terrible things happening, we want to create additional copies for other secondary purposes, such as dev/test, finance, long-term backup, etc.

 

We can create space-efficient virtual copies with our snapshot technology. Or we can create full copy clones using Hitachi ShadowImage. Either way, they are created almost instantaneously with no impact on the product systems. When needed, the copy is mounted to a proxy server and made available to the user.

 

All of these copy operations may require multiple tools, complex scripting and manual processes. But with Hitachi Data Instance Director, we offer a way to automate and orchestrate all of it, combining these steps into a single policy-based workflow that is very easy to set up and manage.

 

3DC Architecture.png

 

We can then take this automation to the next level, by creating service-level based policy profiles. For example, think of gold, silver and bronze services, which are selected based on business needs for the particular application. These profiles can determine the frequency of protection, the tier of storage to use, user access rights, retention, etc.

 

Everything we’ve talked about can be easily tied into the Hitachi approach to data center modernization. For example, as Hitachi Automation Director (HAD) is provisioning the resources needed to spin up a new application workload, it can automatically provision the correct tier of data protection services at the same time. The communication between HAD and HDID is via a robust RESTful API.

 

In the near future, Hitachi Infrastructure Analytics Advisor and Hitachi Unified Compute Platform Advisor will be able to monitor HDID and recommend opportunities to improve copy management processes.

 

To learn more about Hitachi Vantara's approach to modernizing data center operations, check out these blogs:

 

 

Rich Vining is a Sr. WW Product Marketing Manager for Data Protection and Governance Solutions at Hitachi Vantara and has been publishing his thoughts on data storage and data management since the mid-1990s. The contents of this blog are his own.

Data center modernization isn’t complete without the right IT Operations Management (ITOM) tools to ensure your data center is running smoothly.  Today’s data center operations are under constant change with new systems, technologies and applications being added, moved and fine-tuned.  Most ITOM tools have a domain specific view into the infrastructure that can be further restricted by vendor-specific approaches.  If you’re looking at a silo view of your data center, it can be difficult to ensure your applications are running at peak performance across all the various infrastructure and devices that are needed to support them.

 

To address these IT operation challenges, Gartner has been promoting the need for AI Operations, or Artificial Intelligence for IT Operations, where machine learning and big data are used for a new holistic view into IT infrastructure for improved data center monitoring, service management and automation.  Let’s see if Gartner is onto something here.

 

Gartner: AI Ops Platform*

Gartner AI Ops Image.png

 

AI Operations starts with gathering large and various data sets; lots of telemetry data from across disparate systems (applications, servers, network, storage, etc.) to be analyzed.  Using machine learning (ML) algorithms, this data is mined to gain new AI insights that can be used to optimize across these various infrastructure systems.  For example, an on-line retailer wants to assess their readiness for Cyber Monday workloads.  If they used domain-specific ITOM tools, they would only get a silo view (i.e. server or storage only) into their IT  operations that would limit their insights.  AI Operations tools benefit from aggregating analysis across multiple data sources providing a broader, complete view into the IT infrastructure that can be used to improve data center monitoring and planning.

 

In addition to monitoring, AI Operations can impact other IT operation processes such as decreasing the time and effort required to identify and avoid availability or performance problems.  For example, it’s best to be notified a data path between a server and a shared storage port is saturated and then quickly receive a recommended alternative path with plenty of time to move applications that may be overloading the saturated path.  Compare this approach to where an administrator receives separate notices about performance problems on networking and storage ports, then needs to confirm the two issues are related before trying to find an acceptable solution.  AI Operations provides the opportunity to use machine learning to identify interconnected resource trends and dependencies in order to quickly analyze problems compared to manual, silo approaches that are typically based on trial and error.

 

Hitachi Vantara’s recent announcement to its Agile Data Infrastructure and Intelligent Operations portfolio illustrates how these new machine learning and big data approaches can transform IT operations.  The new releases and integration between Hitachi Infrastructure Analytics Advisor (HIAA) and Hitachi Automation Director (HAD) provide new AI Operations capabilities to establish intelligent operations and the foundation for autonomous management practices:

 

  • Predictive Analytics – New ML algorithms and custom risk profiles to assess future resource (virtual machine, server or storage) requirements that incorporate all resource interdependencies.  It provides a more complete and accurate resource forecast as it includes performance and capacity as well as all dependent resource requirements on the same data path. This helps to ensure you are upgrading all the right data path resources with the proper configurations when adding a new application workload.
  • Enhanced Root Cause Analysis – New AI, heuristic engine to diagnose problems across the data path faster (4x) with prescriptive analytics recommendations.  By providing suggested resolutions to common problems, the effort and expertise required to troubleshoot performance bottlenecks is greatly reduced while further lower mean-time-to-repair (MTTR) objectives.
  • Cross Product Integration – New integration between HIAA, HAD and Hitachi Data Instance Director (HDID) enable new opportunities for AI-enhanced management practices.  HIAA can now directly execute QoS commands or suggested problem resolutions, i.e. required resource configuration changes, seamlessly with HAD's automated management workflows.  Through its HDID integration, HAD  incorporates new data protection policies, i.e snapshots and clones, into its automated provisioning processes for improved resource orchestration based on both QoS and data resiliency best practices.
  • Improved Management Integration – Enhanced REST APIs provide increased flexibility to integrate HAD into existing management frameworks and practices.  For example, HAD can easily be integrated with IT Service Management (ITSM) ticketing systems, such as ServiceNow, to incorporate the right authentication process or be tied into a broader automated management workflow.

 

These new updates help to deliver on Hitachi’s AI Operations approach for intelligent operations based on four key data center management steps to deliver enhanced analytics and automation with a continuous feedback loop:

 

Hitachi's AI Operations Approach for Intelligent Operations

AI Ops Image.png

  • Alert: Utilize ML to continuously monitor across multi-domains (virtual machines, servers, network and storage) and quickly be alerted for performance anomalies while ensuring service levels for business applications.  This helps to filter out unwanted noise and events, so you can keep focused on avoiding problems or issues that might affect your users.
  • Analyze: Leverage algorithms to identify historical trends, patterns or changing application workloads to be better informed on how to optimize resources on the data path or increase utilization of underutilized resources.
  • Recommend: Provide new insights to quickly identify the root cause of problems or analyze evolving requirements to optimally plan for new configurations that may be required for data center expansion.
  • Resolve: Drive action with integrated workflows or orchestration to streamlining adaptive configuation changes or necessary problem fixes. 

 

These new integrated operational capabilities can help you to better analyze, plan and execute change necessary to optimize IT operations.  This ensures data center systems are running efficiently and at the right cost, which is the real promise for AI Operations.  Whether it’s helping to highlight new trends, identifying problems faster or improving delivery of new resources, AI Operations’ greatest impact is to help IT administrators do their jobs better with the right insights so they can focus on projects that have a strategic  impact to their business.

 

You can read more great blogs in this series here:

Hu Yoshida's blog - Data Center Modernization - Transforming Data Center Focus from Infrastructure to Information

Nathan Muffin's blog – Data Center Modernization

Mark Adams's blog - Infrastructure Agility for Data Center Modernization

Summer Matheson's blog - Bundles are Better

Paula Phipps' blog - The Super Powers of DevOps to Transform Business

 

Storage Systems

Software

Richard Jew's Blog

 

*Sources

AIOps Platform Enabling Continuous Insights Across IT Operations Management (ITOM)

Market Guide for AIOps Platforms - Gartner, August 2017

Looking At, And Beyond a Storage / Server Mindset

 

For those that want to skip ahead: Data center modernization – creating the next generation data center – requires you to consider systems, protection and operations. Many vendors think all you need is new systems - WRONG! As we’ll discuss that mindset will end up raising costs and can inhibit innovation. At the end are links to dig deeper on this topic. Check out the press release too!

 

ross.jpgEver seen Joy of Painting? If not, check it out. Bob Ross was a great painter and a calming force, which is impressive given his history as an US Air Force drill sergeant. A recurring component of Ross shows and paintings is ‘happy little trees.’ In fact, happy trees show up so often that there are memes, shirts and more for the phrase.

 

Interestingly, there’s a strong correlation between happy trees and vendors simplifying data center modernization as nothing more than a refresh of systems – storage, server, networking. That may appeal to the IT junkies in us, but thinking modernization only equals new systems is counterproductive to success.

 

Why? Because data center modernization encompasses a lot more than systems. People may love Bob Ross’ happy trees, but without the rest of the landscape it isn’t a picture. It’s incomplete. Similarly, if you don’t look beyond systems when modernizing, your costs may go UP instead of down and innovation may slow down.

 

Why Modernize?

 

To understand why a system-only mindset can hurt you, it helps to consider why IT leaders modernize to create next generation data centers. Priorities vary, but the reasons I continually here are:

 

  • Increase operational efficiency and reduce capital expenditures
  • Accelerate time to market for new projects and programs
  • Improve customer experiences
  • Minimize risk to the business

 

Said another way, data center modernization is about supporting and accelerating innovation. Refreshing systems to get new functionality and meet evolving SLAs is absolutely a part of this, but it only gives you modern systems – not a modern, next generation, data center. So, if your vendor only focuses on systems… watch out!

 

Why Systems Alone Don’t Modernize a Data Center

 

car.pngIf I put a jet engine on a 1970s Chevy, do I have a modern car? That’s pretty clearly a big ‘no.’ I may have a new engine, but everything that surrounds it is not optimized for the jet engine. The stability of the vehicle, the quality of the driver experience and more need to be modernized to support that engine! And since the owner is likely to spend more time fixing things than driving, where’s the ROI!?!?!?

 

Here’s one that hits closer to home (thanks Gary Breder). If you have a single camera checking the entrance to a data center, you can have one person monitoring it. But what if you add 4 or even 100 cameras to cover the inside and outside of the facility? Can one person watch the all those feeds? Not well. You could linearly add staff as camera counts increase, but that adds cost and eats up your staff’s time, keeping them from more strategic work! Instead, you need to rethink – modernize – management.

 

The same is true with data centers. You must modernize processes – and protection – to scale with the changes in your environment. Otherwise you have inefficiencies that cost you time, money and agility.

 

A More Complete Approach to Modernization

 

There are several ways to ‘slice’ a broader view of data center modernization, but I like to keep things simple, so let’s break things down into 3 categories to start:

 

  1. Agile Data Infrastructure
  2. Modern Data Protection
  3. Intelligent (AI) Operations

 

These categories allow us to cover areas where we are mostly likely to enhance systems, software and operational processes for increased operational efficiency, faster time to market, improved customer experiences and accelerated adoption of new business models. We can define each area as follows:

 

Agile Data Infrastructure: The combination of systems needed to consistently deliver data at high speed for the best possible experience. These systems should be resilient, support a broad diverse range of use cases and enable the business to quickly without being constrained.

    • Buying Consideration: If you have more than one application in your data center, odds are you’ll need a few different types of systems, so look for a vendor that can offer a range of choices to meet your needs..

 

Modern Data Protection: Software and processes that ensures continuous availability of data – independent of external influences – in support of the customer experience. Modern protection also supports adherence to compliance requirements, new regulations and data security.

    • Buying Consideration: With new data privacy guidelines and concerns about security, data protection is becoming even more complex. Look for a partner that has a solid consulting team and knows how to integrate their offering into your existing framework.

 

Intelligent (AI) Operations: Integrated software that leverages AI and machine learning to analyze, predict, prescribe and execute changes to achieve data center SLAs. This software ensures systems are continually optimized for peak performance, stability and cost efficiency. This frees data center staff to focus on strategic initiatives / implementing new technologies, accelerating innovation.

    • Buying consideration: This is an emerging area that will change a lot over the next few years. Be sure to look at vendors with an API integration focus. This will let them integrate their products with other vendor offerings to create a ‘collaborative’ AI or a ‘hive mind’ for deeper insights, more robust automation. Check out our AI Operations portfolio including Hitachi Infrastructure Analytics Advisor and Automation Director.

 

modern.pngIf I go back to the car analogy, a next generation car will certainly have a engine (system), but it will also have a new user interface and challenge our thoughts on driving (AI operations) as well as the rules / regulations of the roads (protection). Kind of like this image of how the inside of a self-driving car might change in the future.

 

Or as Bob Ross might say, happy little trees are wonderful, but without the sky, clouds, and other things, it really isn’t a complete picture.

 

Hey, That’s It?

 

Hold on! We didn’t describe each of the areas! I know. That happens in other blogs coming!

 

In the mean time check out the press release for our new VSP systems and AI Operations software. Also check out this video series we did on data management and creating an integrated, AI operation portfolio.

https://www.hitachivantara.com/go/storage-switzerland-videos/

 

Ask ten people for their thoughts on Artificial Intelligence and you will get answers that span the emotional range from “Alexa is great!” to “HAL 9000: I’m sorry Dave, I’m afraid I can’t do that”.

 

Personally, I believe that we need to embrace this nascent technology and trust that we will never need to meet HAL 9000, good intentions or not.

 

So how does Intelligent Automation impact YOU, a knowledge worker in high tech?  Especially if you’re a highly valued and  highly stressed member of an IT team, responsible for responding quickly and often to business & client needs, while at the same time ensuring that you’re “keeping the lights on” with zero impact to users’ ability to access business applications.

 

You’ve read in my previous blogs how Hitachi Vantara’s Hitachi Automation Director software can help accelerate resource development and reduce manual steps by >70%.

 

THIS IS PART II of the blog - how to get an ALEXA SKILL up and running.

 

PART III of the blog will be posted later :  Hitachi Automation Director’s capability to be integrated w ALEXA SKILL

 

Today, let’s take it a step further by discussing what you can do with Hitachi Automation Director’s flexible REST API with necessary context via JSON payload. Specifically how HAD’s infrastructure service catalog can be presented as menu items for upper lay CMP or a voice-oriented CMP (Cloud Management Platform) via Alexa Skill. Alexa demo is a technology preview that showcases how HAD can integrate with northbound cloud management layer.

 

That’s correct – use ALEXA in conjunction with Hitachi Automation Director to provision your storage, among other cool things – whoa!!!

 

 

 

FULL DISCLOSURE:  This is a technology preview demo to showcase Hitachi Automation Director capabilities, as a proof of concept. We have created ALEXA and Hitachi Automation Director demo and have shown to customers what can be done. Today, this is not a formally released or supported feature. For actual production use, you need to consider factors such as cloud integration, security, and contractual obligation with cloud vendor like Amazon Alexa or Google Home, etc.

 

Are you ready to get started?  Some minor (or major coding) to follow, depending on you background. 

The following is credited to Craig Chan, Hitachi Vantara’s Director of SW Strategy and resident man of a thousand hats and former developer, in his own words.

 

It’s always easier to leverage someone else’s work and that’s just what I did here.  I knew I wanted to use Python because we have sample Python scripts for our products that are easy to use.  So, a quick search on the internet turned up a nice Alexa Python Tutorial – Build a voice experience in 5 minutes or less.  What a great way to get started!  Setting up this dialog was a snap and it did only take 5 minutes.  I spent more time setting up my Amazon developer account.  Remember last time I took you through the dialog with Alexa to manage my storage?  Well here is where it all began.  Let’s look at the Color Picker dialog:

ME

“Alexa, open Color Picker”

ALEXA

“Welcome to the Alexa skills kit sample. Please tell me your favorite color by saying, my favorite color is red.”

ME

“My favorite color is blue.”

 

ALEXA

“I now know your favorite color is blue. You can ask me your favorite color by saying, what’s my favorite color?”

 

ME

“What’s my favorite color?”

 

ALEXA

“Your favorite color is blue. Goodbye.”

 

Simple, but effective.  I created my first Alexa Skill and it didn’t take much effort!  Now how does that all work?  Well the trick here is all the code is already written for you.  By selecting the “blueprint” you were basically taking Amazon’s code and using it as a skill.  We have to read through that code to understand how it works, but instead of pasting 207 lines of code here, go ahead to Amazon’s site and check it out.

Now I want to take that dialog and make it storage specific.  So instead of the above dialog, the new dialog looks like this:

 

ME

“Alexa, open my storage”

 

ALEXA

“Your storage is running properly. How can I help you manage your storage today? You can say, create a volume of 100 gigabytes.”

 

ME

“Create a volume of 20 gigabytes.”

 

ALEXA

“Okay. I can create a volume of 20 gigabytes. I can also format your volume for you. Just say, please format my volume.”

 

ME

“Please format my volume.”

 

ALEXA

“I am formatting your volume of 20 gigabytes. Formatting volume complete. Thank you for using Alexa to manage your storage.”

 

Great! Alexa just created a formatted a volume of 20 GB!  Well, not exactly.  You had that dialog with Alexa, but it didn’t really do anything.  Having the dialog is pretty cool though and it did hear what capacity you asked for and listened to your request to format it.  What happened here is I took the “myColorPicker” function and just modified the text.  I also wanted to know what variables were being saved so I changed those as well.  Now instead of saving my favorite color, it was saving my capacity choice.  Take a look at the code I attached here. It’s in Python so it’s pretty easy to read through.

 

As you read through the code you might have noticed something called an “intent”, or if you were paying real close attention, you might have noticed something else called a “slot”.  Intents are defined in the Amazon developer portal where you develop the actual skill that uses the code you put into Lambda.  The Color Picker Skill uses “MyColorIsIntent” and “WhatsMyColorIntent”.  The slot is the “LIST_OF_COLORS” or choices that you have for colors (I added purple to mine).  For my new skill, let’s call it VSPG Storage Creator, I changed the intents to “MyCapacityIsIntent” and “FormatVolumeIntent”.  Then I changed the slot to “LIST_OF_CAPACITIES”.  Now I didn’t want to go wild with capacities so only capacities of 10-100 in increments of 10 were allowed.  And one last thing, some sample utterances.  These are the phrases you are expecting the person talking to Alexa to say. Depending on how flexible you want Alexa to be, you can change this to whatever you want, but for simplicity, I just modified the Color Picker ones to “MyCapacityIsIntent Create a volume of {Capacity} gigabytes” and “FormatVolumeIntent please format my volume”.

 

Okay, that was a lot to read, and probably confusing unless brought into context.  Let’s follow the instructions below to first setup Lambda:

 

 

 

 

Code?! Yes code!  But this code is pretty easy, even if it’s really long.  So to make it easier on you, just copy and paste the below code to replace in the lambda_function.py area.

"""

This is a demo VSP-G Storage skill built with the Amazon Alexa Skills Kit.

 

"""

 

from __future__ import print_function

 

 

# --------------- Helpers that build all of the responses ----------------------

 

def build_speechlet_response(title, output, reprompt_text, should_end_session):

    return {

       'outputSpeech': {

            'type': 'PlainText',

            'text': output

        },

        'card': {

            'type': 'Simple',

            'title': "SessionSpeechlet - " + title,

            'content': "SessionSpeechlet - " + output

        },

        'reprompt': {

            'outputSpeech': {

                'type': 'PlainText',

                'text': reprompt_text

            }

        },

        'shouldEndSession': should_end_session

    }

 

 

def build_response(session_attributes, speechlet_response):

    return {

        'version': '1.0',

        'sessionAttributes': session_attributes,

        'response': speechlet_response

    }

 

 

# --------------- Functions that control the skill's behavior ------------------

 

def get_welcome_response():

    """ If we wanted to initialize the session to have some attributes we could

    add those here

    """

 

    session_attributes = {}

    card_title = "Welcome"

    speech_output = "Your storage is running properly. " \

                    "How can I help you manage your storage today? " \

                    "You can say, create a volume of 100 gigabytes."

    # If the user either does not reply to the welcome message or says something

    # that is not understood, they will be prompted again with this text.

    reprompt_text = "Sorry, I didn't catch that. " \

                    "How can I help you manage your storage today? " \

                    "You can say, create a volume of 100 gigabytes."

    should_end_session = False

    return build_response(session_attributes, build_speechlet_response(

        card_title, speech_output, reprompt_text, should_end_session))

 

 

def handle_session_end_request():

    card_title = "Session Ended"

    speech_output = "Thank you for managing your storage with Alexa. " \

                    "Have a nice day! "

    # Setting this to true ends the session and exits the skill.

    should_end_session = True

    return build_response({}, build_speechlet_response(

        card_title, speech_output, None, should_end_session))

 

 

def create_desired_capacity_attributes(desired_capacity):

    return {"desiredCapacity": desired_capacity}

 

 

def set_capacity_in_session(intent, session):

    """ Sets the capacity in the session and prepares the speech to reply to the

    user.

    """

 

    card_title = intent['name']

    session_attributes = {}

    should_end_session = False

 

    if 'Capacity' in intent['slots']:

        desired_capacity = intent['slots']['Capacity']['value']

        session_attributes = create_desired_capacity_attributes(desired_capacity)

        speech_output = "Okay. I can create a volume of " + \

                        desired_capacity + " gigabytes"\

                        ". I can also format your volume for you. " \

                        "Just say, please format my volume."

        reprompt_text = "I can also format your volume for you. " \

                        "Just say, please format my volume."

    else:

        speech_output = "I don't have that capacity available. " \

                        "Please try again."

        reprompt_text = "I don't have that capacity available. " \

                        "Please tell me a capacity number I can use."

    return build_response(session_attributes, build_speechlet_response(

        card_title, speech_output, reprompt_text, should_end_session))

 

 

def format_volume_from_session(intent, session):

    session_attributes = {}

    reprompt_text = None

 

    if session.get('attributes', {}) and "desiredCapacity" in session.get('attributes', {}):

        desired_capacity = session['attributes']['desiredCapacity']

        speech_output = "I am formating your volume of " + desired_capacity + " gigabytes"\

                        ". Formating volume complete. Thank you for using Alexa to manage your storage."

        should_end_session = True

    else:

        speech_output = "I don't have any capacity to format. " \

                        "You can say, create a volume of 100 gigabytes."

        should_end_session = False

 

    # Setting reprompt_text to None signifies that we do not want to reprompt

    # the user. If the user does not respond or says something that is not

    # understood, the session will end.

    return build_response(session_attributes, build_speechlet_response(

        intent['name'], speech_output, reprompt_text, should_end_session))

 

 

# --------------- Events ------------------

 

def on_session_started(session_started_request, session):

    """ Called when the session starts """

 

    print("on_session_started requestId=" + session_started_request['requestId']

          + ", sessionId=" + session['sessionId'])

 

 

def on_launch(launch_request, session):

    """ Called when the user launches the skill without specifying what they

    want

    """

 

    print("on_launch requestId=" + launch_request['requestId'] +

          ", sessionId=" + session['sessionId'])

    # Dispatch to your skill's launch

    return get_welcome_response()

 

 

def on_intent(intent_request, session):

    """ Called when the user specifies an intent for this skill """

 

    print("on_intent requestId=" + intent_request['requestId'] +

          ", sessionId=" + session['sessionId'])

 

    intent = intent_request['intent']

    intent_name = intent_request['intent']['name']

 

    # Dispatch to your skill's intent handlers

    if intent_name == "MyCapacityIsIntent":

        return set_capacity_in_session(intent, session)

    elif intent_name == "FormatVolumeIntent":

        return format_volume_from_session(intent, session)

    elif intent_name == "AMAZON.HelpIntent":

        return get_welcome_response()

    elif intent_name == "AMAZON.CancelIntent" or intent_name == "AMAZON.StopIntent":

        return handle_session_end_request()

    else:

        raise ValueError("Invalid intent")

 

 

def on_session_ended(session_ended_request, session):

    """ Called when the user ends the session.

 

    Is not called when the skill returns should_end_session=true

    """

    print("on_session_ended requestId=" + session_ended_request['requestId'] +

          ", sessionId=" + session['sessionId'])

    # add cleanup logic here

 

 

# --------------- Main handler ------------------

 

def lambda_handler(event, context):

    """ Route the incoming request based on type (LaunchRequest, IntentRequest,

    etc.) The JSON body of the request is provided in the event parameter.

    """

    print("event.session.application.applicationId=" +

          event['session']['application']['applicationId'])

 

    """

    Uncomment this if statement and populate with your skill's application ID to

    prevent someone else from configuring a skill that sends requests to this

    function.

    """

    # if (event['session']['application']['applicationId'] !=

    #         "amzn1.echo-sdk-ams.app.[unique-value-here]"):

    #     raise ValueError("Invalid Application ID")

 

    if event['session']['new']:

        on_session_started({'requestId': event['request']['requestId']},

                           event['session'])

 

    if event['request']['type'] == "LaunchRequest":

        return on_launch(event['request'], event['session'])

    elif event['request']['type'] == "IntentRequest":

        return on_intent(event['request'], event['session'])

    elif event['request']['type'] == "SessionEndedRequest":

        return on_session_ended(event['request'], event['session'])

 

You’ve just coded your very own Alexa skill! As you put that python script into Lambda, you might have noticed that we created our own names for the intents.  This leads us into configuring the skill to work with our intents.  Intents are things you want to happen.  For us, it’s about creating a volume and formatting that volume.  For these intents, we need to define a set of valid values (capacity amounts) and utterances (phrases that Alexa will understand).  Let’s configure our skill.

 

 

And we are done! Go ahead and test your new Alexa Skill and see how you can interact with Alexa.  Try different utterances and even different dialog in the code so Alexa says different things back to you.  Also give your own invocation name so it becomes your very own unique skill. 

 

Stay tuned for Part III of the blog, same time same channel!

 

Forward!!

Per IDC’s Copy Data Management Challenge, “65% of storage system capacity is used to store non-primary, inactive data”. In fact, Inactive data residing on flash is the single biggest threat to the ROI of data center modernization initiatives. As my friends and I get older we often find ourselves talking about “right-sizing”, or moving to a smaller, and less expensive piece real estate. In many cases life has evolved or perhaps needs have changed and we simply don’t want or need the cost of maintaining a larger residence.  The benefits are obvious when it comes to savings in mortgage, utilities, and general upkeep.  Well your data has a lifecycle as well, it starts being very active with high IO and the need for a high quality of service and low latency.  However, as data ages it’s just not economical or necessary for it to reside in the “high rent” district which in most cases is an active flash tier.

 

 

 

Now data tiering through the lifecycle is not a new concept, but the options available to you have never been greater.   Destinations such as lower cost/performance tiers of spinning disk can be an option.  If your organization has deployed a private cloud, that might be an excellent destination to tier inactive data.  For those who have adopted public cloud IaaS, that certainly is a low-cost destination as well.  Let’s explore some of these options and solutions for managing the data lifecycle through the different options available.  More importantly, let us look at some issues that should be considered before creating a data lifecycle residency plan with the goal of maximizing your current investments in on premise all flash arrays, and both private and public clouds.

 

Automated Storage Tiering

Deploying automated storage tiering is a good place to get started as the concept is familiar to most storage managers.  For example, Hitachi Dynamic Tiering is software which allows you to create 3-tier pools within the array, and it allows you to enact policies which will automatically move data to a specified type of media pool once the pre-defined criteria has been met.

 

 

In a modern hybrid flash array like the Hitachi Vantara VSP G series, your pools can be defined based upon the type of storage media in the array.  This is especially economic in the VSP G series because the array can be configured with SSDs, Hitachi Flash Modules (FMD), or hard disk drives.  Essentially, high IO data starts residency on high performance SSD or FMD, and then dynamic tiering automatically moves it to low cost hard disk drives as it ages and becomes less active.  The savings in storage real estate in terms of cost per GB can be well over 60%.  But wait there’s more benefits to be had.

 

Integrated Cloud Tiering – Private Clouds and Object Storage

It’s no secret that migrating inactive data to the cloud can lead to a storage saving well over 70%.  The benefit doesn’t stop there as a well-managed data lifecycle frees up top tier flash for higher priority active data. Many top financial institutions choose to tier inactive based data off flash tiers and onto lower cost private cloud object storage.  In this way, they get the savings of moving this data into a low-cost tier and the there are no questions of data security and control behind the corporate firewall.  In addition, if the data ever needs to be moved back to and active tier it can be done quickly and inexpensively without the data egress fees incurred by public cloud providers.  In addition, Private cloud object storage like the Hitachi Content Platform (HCP), give your enterprise a “rent-controlled” residence with all the benefits of a public cloud and without concerns of security because you are in control of your data.

 

Cloud Tiering – Public Clouds

Public clouds like Amazon and Azure have changed the data residency landscape forever.  They are an excellent “low-cost” neighborhood for inactive data.  Companies of all sizes, from small to the largest enterprise, leverage the low cost and ‘limitless’ storage of public clouds as a target for inactive and archive data.

 

Potential Issues - Tiering Data to the Cloud

The concept of tiering to either public or private clouds is simple but executing a solution may not be as straightforward.  Many storage vendors claim the ability to tier to the cloud, but when you look at their solution, you’ll often find that they are not transparent to applications and end-users, often requiring significant rework of your IT architecture and a lower / delayed ROI in data center modernization. These solutions add complexity to your environment and often add siloed management requirements. The bottom line is that very few vendors understand and offer an integrated solution of high performance flash, automated tiering, and a choice of migration to multiple cloud types. Regarding public clouds, it should be noted that a downside is that if the data is not quite inactive, let’s say “warm,” it can be very costly to pull it back from the public cloud due to the previously mentioned data egress fees.  Not to mention it can take a very long time based on the type of service level agreement. For this reason, many tenants choose to only migrate backups and cold archive data to public clouds.

 

Hitachi Vantara Cloud-Connected Flash – 1 Recipe integrates all the right Ingredients

Cloud-Connected Flash is a solution from Hitachi Vantara which delivers a complete solution for file data lifecycle residency.  Hitachi is a leading vendor in hybrid flash arrays, all-flash arrays, object storage and cloud.

 

 

The solution is an easy concept to understand, “Data should reside on its most economic space”.  As illustrated in the graphic above. Active NAS data is created and maintained in a Hitachi Vantara VSP G or F series unified array.  Within the array, data is dynamically tiered based on the “migration policy” between pools of SSD, FMD (Hitachi Flash Modules) and disk drives.  As the file data ages, Hitachi data migration to cloud software moves the data, based on policy to your choice of clouds (Public Amazon, Azure, IBM Cloud or private HCP, Hitachi Content Platform).  When migrating data to HCP, the data can also be pulled back into the top flash tiers if needed creating a highly flexible and dynamic data residency environment.  But the value doesn’t stop at the savings in terms of cost of storage and maintenance. The Hitachi cloud connected flash solution can also include an analytics component to glean insights from your data lake which is comprised of both on premise and cloud data.

 

Cloud to Data Insights with Pentaho Analytics

Pentaho Analytics enables the analysis of your data both on premise and in public and private clouds. As an expert in IoT and analytics, Hitachi Vantara offers the customizable Pentaho platform to analyze data and create dash boards and reports in your cloud connected flash environment. The goal being to achieve better business outcomes by leveraging your Hitachi Vantara cloud-connected flash solution. Pentaho offers data integration, a highly flexible platform for blending, orchestrating, and analyzing data from virtually any source, effectively reaching across system, application and organizational boundaries.

  • Run Pentaho on a variety of public or private cloud providers, including Amazon Web Services (AWS) and Microsoft Azure
  • Leverage scale-out architecture to spread data integration and analytics workloads across many cloud servers
  • Connect on premise systems with data in the cloud and blend a variety of cloud-based data sources together
  • Seamlessly integrate and analyze leading Big Data sources in the Cloud, including Amazon Redshift and hosted Hadoop distributions
  • Access and transform data from web services and enterprise SaaS applications

 

 

Learn More, Hitachi Vantara Can Help

Hitachi Vantara Cloud-Connected Flash solutions are highly flexible, and individual components can be deployed based on business needs.  This enables customer to start with dynamic array tiering, then add cloud tiering and ultimately analytics. Please contact your Hitachi Representative or Hitachi Partner to learn how your organization can benefit from cloud-connected flash.

 

NVMe (non-volatile memory express) is a standard designed to fully leverage the performance benefits of non-volatile memory in all types of computing and data storage.  NVMe’s key benefits include direct access to the CPU, lightweight commands, and highly scalable parallelism which all lead to lower application latency and insanely fast IO. There has been a lot of hype surrounding NVMe in the press and how it can put your IT performance into the equivalent of Tesla’s “Ludicrous mode”, but I would like to discuss some “real world” considerations where NVMe can offer great benefits and perhaps shine a caution light in areas that you might not have considered.  As NVMe is in its infancy as far as production deployment, its acceptance and adoption are being driven by a need for speed.  In fact Hitachi Vantara recently introduced NVMe caching in its hyperconverged Unified Compute Platform (UCP).  This is a first step to mobilizing the advantages of NVMe to accelerate workloads by using NVME in the caching layer.

 

Parallel vs Serial IO execution - The NVMe Super Highway

What about storage? So where are the bottlenecks and why can’t SAS and SATA keep up with today’s high performance flash media?  The answer is that both SAS and SATA were designed for rotating media and long before flash was developed, consequently these command sets have become the traffic jam on the IO highway. NVMe is a standard based on peripheral component interconnect express (PCIe) and its built to take advantage of today’s massively parallel architectures.  Think of NVMe as a Tesla Model S capable of achieving 155mph in 29 seconds, stifled by old infrastructure (SAS/SATA) and a 55mph speed limit. All that capability is wasted. So what is driving the need for this type of high performance in IT modernization?  For one, Software-Defined Storage (SDS) is a rapidly growing technology that allows for the policy–based provisioning and management of data storage independent of the underlying physical storage. As datacenter modernization is at the core of IT planning these days, new technologies such as Software-Defined Storage are offering tremendous benefits in data consolidation and agility. As far as ROI and economic benefits, SDS’s ability to be hardware agnostic, scale seamlessly, and deliver simplified management is a total victory for IT. So then what is the Achilles heel for SDS and its promise to consume all traditional and modern workloads?  Quite frankly, SDS has been limited by the performance constraints of traditional architectures. Consequently, many SDS deployments are limited to applications that can tolerate the latency caused by the aforementioned bottlenecks.

 

Traditional Workload: High-Performance OLTP and Database Applications

Traditional OLTP and database workloads are the heartbeat of the enterprise. I have witnessed instances of customers having SDS deployments fail because of latency between storage and the application, even when flash media was used.  Surely the SDS platform, network, and compute were blazing fast, but the weak link was the SAS storage interface.  Another problem is that any type of virtualization or abstraction layer used to host the SDS instance on a server is going to consume more performance than running that service on bare metal. In an SDS environment, highly transactional applications will require the additional IOPS to keep latency from the virtualization layer in check and deliver the best quality of service to the field.  At the underlying storage level, traditional SAS and SATA constrain flash performance. The bottom line is that NVMe inherently provides much greater bandwidth than traditional SAS or SATA.  In addition, NVMe at the media level can handle 64,000 queues compared to SAS (254 queues) and SATA (32 queues).   This type of performance and parallelism can enable high-performance OLTP and deliver optimized performance with the latest flash media.  So the prospect is that more traditional high-performance OLTP workloads can be migrated to an SDS environment enabled by NVMe.

 

 

Caveat Emptor – The Data Services Tax
The new world of rack scale flash, SDS, and hyper converged infrastructure offerings promise loftier performance levels, but there are speed bumps to be considered.  This is especially true when considering the migration of mission-critical OLTP applications to a software-defined environment.  The fact of the matter is that data services (compression, encryption, RAID etc.) and data protection (snaps, replication, and deduplication) reduce IOPS. So be cautious when considering a vendor’s IOPS specification because in most cases the numbers are for unprotected and un-reduced data. In fact, data services can impact IOPS and response times to the extent that AFA’s with NVMe will not perform much better than SCSI-based AFA’s The good news is that NVMe performance and parallelism should provide plenty of horsepower (IOPS) to enable you to move high performance workloads into an SDS environment. The bad news is that you will need your hardware architecture to be more powerful and correctly designed to perform data services and protection faster than ever before (e.g. more IO received per second = more deduplication processes that must occur every second). Note that you also need to consider whether or not your SDS application, compute, network and storage are designed to take full advantage of NVMe’s parallelism.  Also note that a solution is only as fast as its weakest link and for practical purposes it could be your traditional network infrastructure. If you opt for NVMe on the back-end (between storage controllers and media) but do not consider how to implement NVMe on the front-end (between storage and host / application), you may just be pushing your performance bottleneck to another point of IO contention and you won't get any significant improvement.

 

Modern Workload: Analytics at the Edge

It seems as though “Analytics” has replaced “Cloud” as the IT modernization initiative de jour. This is not just hype as the ability to leverage data to understand customers and processes is leading to profitable business outcomes never before possible. I remember just a few years ago the hype surrounding Hadoop and batch analytics in the core data-center and in the cloud.  It was only a matter of time before we decided that best place to produce timely results and actions from analytics are at the edge. The ability to deploy powerful compute in small packages makes analytics at the edge (close to where the data is collected) a reality.  The fundamental benefit is the network latency being saved by having the compute function at the edge.  A few years ago analytics architecture data would travel via a network or telemetry-to-network and then to the cloud. That data would be analyzed and the outcome delivered back the same way it arrived.  So edge analytics cuts out data traversing the network and saves a significant chunk of time.  This is the key to enabling time sensitive decisions like an autonomous vehicle avoiding a collision in near real-time.  Using NVMe /PCIe, data can be sent directly to a processor at the edge to deliver the fastest possible outcomes.  NVMe enables processing latency to be reduced to microseconds and possibly nanoseconds.  This might make you a little more comfortable about taking your hands off the wheel and letting the autonomous car do the driving…

 

The Take Away

My advice to IT consumers is to approach any new technology with an open mind and a teaspoon of doubt. Don’t get caught up in hype and specs. Move at a modernization pace that is comfortable and within the timelines of your organization.  Your business outcomes should map to solutions and not the other way around. “When in doubt, proof it out”, make sure your modernization vendor is truly a partner.  They should be willing to demonstrate a working proof of concept, especially when it comes to mission-critical application support.  Enjoy the new technology; it’s a great time to be in IT!

 

More on NVMe

NVMe 101: What’s Coming To The World of Flash

How NVMe is Changing Networking Part 1

How NVMe is Changing Networking Part 2

Redesigning for NVMe: Is Synchronous Replication Dead?

NVMe and Data Protection: Time To Put on Those Thinking Caps

The Brocade view on how NVMe affects network design

NVMe and Me – How To Get There

An In Depth Conversation with Cisco.

 

Welcome back! (Yes, I was waiting for this opportunity to show my age with a Welcome Back Kotter image).

 

For those of you that haven’t been following along in real time – c’mon man! – we’re in the midst of a multi-blog series around NVMe and data center design (list of blogs below).

 

That’s right, data center design. Because NVMe affects more than just storage design. It influences every aspect of how you design an application data path. At least it should if you want maximum return on NVMe investments.

 

In our last blog we got the Brocade view on how NVMe affects network design. As you might imagine, that conversation was very Fibre Channel-centric. Today we’re looking at the same concept – network design – but bringing in a powerhouse from Cisco.

 

If you've been in the industry for a while you've probably heard of him: J Michel Metz. Dr. Metz is an R&D Engineer for Advanced Storage at Cisco and sits on the Board of Directors for SNIA, FCIA and NVM Express. So… yeah. He knows a little something about the industry. In fact, check out a blog on his site called storage forces for some background on our discussion today. And if you think you know what he’s going to say, think again.

 

Ok. Let’s dig in.

 

Nathan: Does NVMe have a big impact on data center network design?

 

J: Absolutely. In fact I could argue the networking guys have some of the heaviest intellectual lift when it comes to NVMe. With hard disks, tweaking the network design wasn't nearly as critical. Storage arrays – as fast as they are - were sufficiently slow so you just sent data across the wire and things were fine.

 

Flash changed things, reducing latency and making network design more critical, but NVMe takes it to another level. As we’re able to put more storage bits on the wire, it increases the importance of network design. You need to treat it almost like weather forecasting; monitoring and adjusting as patterns change. You can’t just treat the storage as “data on a stick;” just some repository of data at the end of the wire, where you only have to worry about accessing it.

 

Nathan: So how does that influence the way companies design networks and implement storage?

 

J: To explain I need to start with a discussion of how NVMe communications work. This may sound like a bizarre metaphor, but bear with me.

 

Think of it like how food is ordered in a ‘50s diner. A waitress takes an order, puts the order ticket on the kitchen counter and rings a bell. The cook grabs the ticket, cooks the food, puts the order back on the counter and rings the bell. The waitress then grabs the order and takes it to the customer. It’s a process that is efficient and allows for parallel work queues (multiple wait staff and cooks).

 

Now imagine the customers, in this case our applications, are a mile away from the kitchen, our storage. You can absolutely have the waitress or the cook cross that distance, but it isn't very efficient. You can reduce the time to cross the distance by using a pneumatic tube pass orders to the kitchen, but someone ultimately has to walk the food. That adds delays. Again, the same is true with NVMe. You can optimize NVMe to be transferred over a network, but you’re still dealing with the physics of moving across the network.

 

At this stage you might stop and say ‘hey, at least our process is a lot more efficient and allows for parallelism.’ That could leave you with a solid NVMe over Fabric design. But for maximum speed what you really want is to co-locate the customers and kitchen. You want your hosts as close to the storage as possible. It’s the trade-offs that matter at that point. Sometimes you want the customers in the kitchen. And that’s what hyper-convergence is, but obviously can only grow so large. Sometimes you want a centralized kitchen and many dining rooms. That’s also what you can achieve with rack-scale solutions that put an NVMe capacity layer sufficiently close to the applications, at the ‘top of rack.’ And so on.

 

Nathan: It sounds like you’re advocating a move away from traditional storage array architectures.

 

J: I want to be careful because this isn’t an ‘or’ discussion, it’s an ‘and’ discussion. HCIS is solving a management problem. It’s for customers that want a compute solution with a pretty interface and freedom from storage administration. HCIS may not have nearly the same scalability as an external array, but it does allow application administrators to easily and quickly spin up VMs.

 

As we know though, there are customers that need scale. Scale in capacity; scale in performance and scale in the number of workloads they need to host. For these customers, HCIS isn’t going to fit the bill. Customers that need scale – scale across any vector – will want to make a trade-off in management simplicity for the enterprise growth that you get from an external array.

 

This also applies to networking protocols. The reason why we choose protocols like iWARP is for simplicity and addressability. You choose the address and then let the network determine the best way to get data from point A to point B. But, there is a performance trade-off.

 

Nathan: That’s an excellent point. At no point have we ever seen IT coalesce into a single architecture or protocol. If a customer needs storage scale with a high-speed network what would you recommend?

 

           J: Haven’t you heard that every storage question is answered with, “It depends?”

 

Joking aside, it’s never as simple as figuring out the best connectivity options. All storage networks can be examined “horizontally.” That is the phrase I use to describe the connectivity and topology designs from a host through a network to the storage device. Any storage network can be described this way, so that it’s easy to throw metrics and hero numbers at the problem: what are the IOPS, what is the latency, what are the maximum number of nodes, etc.

 

What we miss in the question, however, is whether or not there is a mismatch between the overall storage needs (e.g., general purpose network, dedicated storage network, ultra-high performance, massive scale, ultra-low latency, etc.) and the “sweet spot” of what a storage system can provide.

 

There is a reason why Fibre Channel is the gold standard for dedicated storage networks. Not only is it a well-understood technology, it’s very, very good and not just performance, but reliability. But for some people there are other considerations to pay attention to. Perhaps the workloads don’t need to lend themselves to a dedicated storage network. Perhaps “good enough” is, well, “good enough.” For them, they are perfectly fine with really great performance with Ethernet to the top-of-rack, and don’t need the kind of high availability and resiliency that a Fibre Channel network, for instance, is designed to provide.

 

Still others are looking more for accessibility and management, and for them the administrative user interface is the most important. They can deal with performance hits because the management is more important. They only have a limited number of virtual machines, perhaps, so HCIS using high-speed Ethernet interconnects is perfect.

As a general rule, “all things being equal” are never actually equal. There’s no shortcut for good storage network design.

 

Nathan: Let’s look forward now. How does NVMe affect long term network and data center design?

 

J: <Pause> Ok, for this one I’m going to be very pointedly giving my own personal opinion. I think that the aspect of time is something we’ve been able to ignore for quite a while because storage was slow. With NVMe and flash though, time IS a factor and it is forcing us to reconsider overall storage design, which ultimately affects network design.

 

Here is what I mean. Every IO is processed by a CPU. The CPU receives a request – write, etc. –passes it on and then goes off to do something else. That process was fine when IO was sufficiently slow. CPUs could go off and do any number of additional tasks. But now, it’s possible for IO to happen so fast that the CPU cannot switch between tasks before the IO response is received. The end result is that a CPU can be completely saturated by a few NVMe drives.

 

Now, this is a worst-case scenario, and should be taken with a grain of salt. Obviously, there are more processes going on that affect IO as well as CPU utilization. But the basic premise is that we now have technologies that are emerging that threaten to overwhelm both the CPU and the network. The caveat here, the key take-away, is that we cannot simply swap out traditional spinning disk, or flash drives, with NVMe and expect all boats to rise.

In my mind this results in needing more intelligence in the storage layer. Storage systems, either external arrays or hyperconverged infrastructures, will ultimately be able to say no to requests and ask other storage systems for help. They’ll work together to coordinate and decide who handles tasks like an organic being.

 

Yes, some of this happens as a result of general machine learning advancements, but it will be accelerated because of technologies like NVMe that force us to rethink our notion of time. This may take a number of years to happen, but it will happen.

 

Nathan: If storage moves down this path, what happens to the network?

 

J: Well, you still have a network connecting storage and compute but it, too, is more intelligent. The network understands what its primary objectives are and how to prioritize traffic. It also knows how to negotiate with storage and the application to determine the best path for moving data back and forth. In effect, they can act as equal peers to decide on the best route.

 

You can also see a future where storage might communicate to the network details about what it can and can’t do at any given time. The network could then use this information to determine the best possible storage device to leverage based on SLA considerations. To be fair, this model puts the network in a ‘service broker’ position that some vendors may not be comfortable with. But since the network is a common factor that brings storage and servers together it creates opportunity for us to establish the best end-to-end route.

 

In a lot of ways, I see end-to-end systems coming together in a similar fashion to what was outlined in Conway’s game of life. What you’ll see is data itself self-organizing based on priorities that are important for the whole system – the application, the server, the network and the storage. In effect, you’ll have autopoiesis, a self-adaptive system.

 

I should note that what I’m referring to here are really, really large systems of storage, not necessarily smaller host-to-storage-array products. There are a lot of stars that need to align before we can see something like this as a reality. Again, this is my personal view.  

 

Nathan: I can definitely see why you called this out as your opinion. You’re looking pretty far in to the future. What if we pull back to the next 18 – 24 months, how do NVMe fabrics play out?

 

Nathan: I know. I’m constraining you. Sorry about that.

 

J: <Laughs> In the near term we’re going to see a lot of battles. That’s to be expected because the standards for NVMe over Fabrics (NVMe-oF) are still relatively new.

 

Some vendors are taking shortcuts and building easy-to-use proprietary solutions. That gives them a head start and improves traction with customers and mind share, but it doesn't guarantee a long-term advantage. DSSD proved that.

 

The upside is that these solutions can help the rest of the industry identify interesting ways to implement NVMe-oF and improve the NVMe-oF standard. That will help make standards-based solutions stronger in the long run. The downside is that companies implementing early standards may feel some pain.

 

Nathan: So to close this out, and maybe lead the witness a bit. Is the safest way to implement NVMe – today – to implement it in an HCI solution and wait for the NVM-oF standards to mature?

 

J: Yeah. I think that is fair to say, especially if there is a need to address manageability challenges. HCIS absolutely helps there. For customers that do need to implement NVMe over Fabrics today, Fibre Channel is probably the easiest way to do that. But don’t expect FC to be the only team on the ball field, long term.

 

If I go back to my earlier point, different technologies are optimized for different needs. FC is a deterministic storage network and it’s great for that. Ethernet-based approaches, though, can be good for simplicity of management, though it’s never a strict “either-or” when looking at the different options.

 

I expect Ethernet-based NVMe-oF to be used for smaller deployment styles to begin with, single switch environments, rack-scale architectures, or standalone servers with wicked fast NVMe drives connected across the network via a Software Defined Storage abstraction layer. We are already seeing some hyperconvergence vendors flirt with NVMe and NVMe-oF as well. So, small deployments will likely be the first forays into NVMe-oF using Ethernet, and larger deployments will probably gravitate towards Fibre Channel, at least in the foreseeable time frame.

 

 

CLOSING THOUGHTS <NATHAN’S THOUGHTS>

 

As we closed out our conversation J made a comment about NVMe expanding our opportunity to address customer problems in new ways.

 

I can’t agree more. In my mind, NVMe can and should serve as a tipping point that forces us, vendors, to rethink our approach to storage and how devices in the data path interoperate.

 

This applies to everything from the hardware architecture of storage arrays; to how / when / where data services are implemented; even to the way devices communicate. I have some thoughts around digital force feedback where an IT infrastructures resists a proposed change and respond with a more optimal configuration in real-time (imagine pushing a capacity allocation to an array on your mobile phone and feeling pressure of it resisting then responding with green lights over more optimal locations & details on why the change is proposed), but that is a blog for a day when I have time to draw pictures.

 

The net is that as architects, administrators and vendors we should view NVMe as an opportunity for change and consider what we keep vs. what we change – over time. As J points out NVMe-oF is still maturing and so are the solutions that leverage it. So to you dear reader:

 

  1. NVMe on HCI (hyper-converged infrastructure) is great place to start today.
  2. External storage with NVMe can be implemented, but beware anyone who says their architecture is future proof or optimized to take full advantage of NVMe (J’s comment on overloading CPUs is a perfect example of why).
  3. Think beyond the box. Invest in an analytics package that looks at the entire data path and lets you understand where bottlenecks exist.

 

Good hunting.

 

NVMe 101 – What’s Coming to the World of Flash?

Is NVMe Killing Shared Storage?

NVMe and Me: NVMe Adoption Strategies

NVMe and Data Protection: Time to Rethink Strategies

NVMe and Data Protection: Is Synchronous Replication Dead?

How NVMe is Changing Networking (with Brocade)

Hitachi Vantara Storage Roadmap Thoughts

An In Depth Conversation with Brocade

 

As we've discussed over the last several blogs, NVMe is much more than a communication protocol. It’s a catalyst for change. A catalyst that touches every aspect of the data path.

 

At Hitachi we understand that customers have to consider each of these areas, and so today we’re bringing in a heavy hitter from Brocade to cover their view of how data center network design changes – and doesn't change – with the introduction of NVMe.

 

The heavy hitter in this case is Curt Beckmann, principle architect for storage networking. A guy who makes me, someone who used to teach SEs how to build and debug FC SANs, feel like a total FC newbie. He’s also a humanitarian, on the board of Village Hope, Inc. Check it out.

 

Let’s dig in.

 

Nathan: Does NVMe have a big impact on data center network design?

 

Curt: Before I answer, we should probably be precise. NVMe is used to communicate over a local PCIe bus to a piece of flash media (see Mark’s NVMe overview blog for more). What we want to focus on is NVMe over Fabric, NVMe-oF. It’s the version of NVMe used when communicating beyond the local PCIe bus.

 

Nathan: Touché. With that in mind. Does NVMe-oF have a big impact on network design?

 

Curt: It really depends on how you implement NVMe-oF. If you use a new protocol that changes how a host interacts with a target NVMe device, you may need to make changes to your network environment. If your encapsulating NVMe in existing storage protocols like FC though, you may not need to change your network design at all.

 

Nathan: New protocols. You’re referring to RDMA based NVMe-oF protocols, right?

 

Curt: Yes. NVMe over Fabrics protocols that use RDMA, iWARP or RoCE, reduce IP network latency by talking directly to memory.  For NVMe devices that can expose raw media, RDMA can bypass CPU processing on the storage controller. This allows faster, more ‘direct’ access between host and media. It does however require changes to the way networks are designed.

 

Nathan: Can you expand on this? Why would network design need to change?

 

Curt: Both iWARP and RoCE are based on Ethernet and IP. Ethernet was designed around the idea that data may not always reach its target, or at least not in order, so it relies on higher layer functions, traditionally TCP, to retry communications and reorder data. That’s useful over the WAN, but sub-optimal in the data center. For storage operations, it’s also the wrong strategy.

 

For a storage network, you need to make sure data is always flowing in order and is ‘lossless’ to avoid retries that add latency. To enable this, you have to turn on point-to-point flow control functions. Both iWARP and RoCE v2 use Explicit Congestion Notification (ECN) for this purpose. iWARP uses it natively. RoCE v2 added Congestion Notification Packets (CNP) to enable ECN to work over UDP. But:

 

      1. They aren't always ‘automatic.’ ECN has to be configured on a host. If it isn't, any unconfigured host will not play nice and can interfere with other hosts’ performance.
      2. They aren't always running. Flow control turns on when the network is under load. Admins need to configure exactly WHEN it turns on. If ECN kicks in too late and traffic is still increasing, you get a ‘pause’ on the network and latency goes up for all hosts.
      3. They aren't precise. I could spend pages on flow control, but to keep things short, you should be aware that Fibre Channel enables a sender to know precisely how much buffer space remains before it needs to stop. Ethernet struggles here.

 

There are protocol specific considerations too. For instance, TCP-based protocols like iWARP start slow when communication paths are new or have been idle, and build to max performance. That adds latency any time communication is bursty.

 

Nathan: So if I net it out, is it fair to say that Ethernet and NVMe is pretty complex today?

 

Curt: (Smiles). There’s definitely a level of expertise needed. This isn't as simple as just hooking up some cables to existing switches. And since we have multiple RDMA standards which are still evolving (Azure is using a custom RoCE build, call it RoCE v3), admins will need to stay sharp. Which raises a point I forgot to mention. These new protocols require custom equipment.

 

Nathan: You can’t deploy iWARP or RoCE protocols on to installed systems?

 

Curt: Not without a NIC upgrade. You need something called an R-NIC. There are a few vendors that have them, but they aren’t fully qualified with every switch in production.

 

That’s why you are starting to hear about NVMe over TCP. It’s a separate NVMe protocol similar to iSCSI that runs on existing NICs so you don’t need new hardware. It isn't as fast, but it is interoperable with everything. You just need to worry about the network design complexities. You may see it ultimately eclipse RDMA protocols and be the NVMe Ethernet protocol of choice.

 

Nathan: But what if I don’t care Curt? What if I have the expertise to configure flow control, plan hops / buffer management so I don’t hit a network pause? What if R-NICs are fine by me? If I have a top notch networking team, is NVMe over Fabric with RDMA faster?

 

Curt: What you can say is that for Ethernet / IP networks, RDMA is faster than no RDMA. In a data center, most of your latency comes from the host stack (virtualization can change the amount of latency here) and a bit from the target storage stack (See Figure 1). That is why application vendors are designing the applications to use a local cache for data that needs the lowest latency. No wire, lower latency. With hard disks, network latency was tiny compared to the disk, and array caching and spindle count could mask the latency of software features.  This meant that you could use an array instead of internal drives. Flash is a game changer in this dynamic, because now the performance difference between internal and external flash is significant.  Most latency continues to be from software features, which has prompted the move from the sluggish SCSI stack to faster NVMe.

  

Figure 1: Where Latency Comes From

 

I've seen claims that RoCE can do small IOs, like 512 bytes, at maybe 1 or 2 microseconds less latency than NVMe over Fibre Channel when the queue depth is set to 1 or some other configuration not used in normal storage implementations.  We have not been able to repeat these benchmarks, but this is the nature of comparing benchmarks.  We were able to come very close to quoted RoCE numbers for larger IO, like 4K. At those sizes and larger, the winner is the one with faster wire speed. This is where buyers have to be very careful. A comparison of 25G Ethernet to 16G FC is inappropriate. Ditto for 32G FC versus 40G Ethernet. A better comparison is 25G Ethernet to 32G FC, but even here check the numbers and the costs.     

 

Nathan: Any closing thoughts?

Curt: One we didn't really cover is ease of deployment alongside existing systems. For instance, what if you want to use a single storage infrastructure to support NVMe-oF enabled hosts and ‘classic’ hosts that are using existing, SCSI based protocols? With FC you can do that. You can use existing Gen 5 and Gen 6 switches and have servers that supports multiple storage interface types. With Ethernet? Not so much. You need new NICs and quite possibly new switches too. Depending on who you speak with DCB switches are either recommended, if you want decent performance, or required. I recommend you investigate.

 

CLOSING THOUGHTS <NATHAN’S THOUGHTS>

Every vendor has their own take on things, but I think Curt’s commentary brings to light some very interesting considerations when it comes to NVMe.

 

 

 

  1. Ecosystem readiness – With FC (and maybe future Ethernet protocols), NVMe may require minimal to no changes in your network resources (granted, a network speed upgrade may be advised). But with RDMA, components change, so check on implementation details and interop. Make sure the equipment cost of changing to a new protocol isn't higher than you expect.
  2. Standard readiness – Much like any new technology, standards are evolving. FC is looking to make the upgrade transparent and there may even be similar Ethernet protocols coming. If you use RDMA, great. Just be aware you may not be future proofed. That can increase operational costs and result in upgrades sooner than you think.
  3. Networking expertise – With Ethernet, you may need to be more thoughtful about congestion and flow control design. This may mean reducing the maximum load on components of the network to prevent latency spikes. It can absolutely be done, you just need to be aware that NVMe over Fabric with RDMA may increase operational complexity that could result in lower than expected performance / ROI. To be clear though, my Ethernet friends may have a different view. We’ll have to discuss that with them.

 

Other than that, I’ll tell you what I told myself when I was a systems administrator. Do your homework. Examine what is out there and ask vendors for details on implementations. If you buy a storage solution that is built around what is available today, you may be designing for a future upgrade versus designing for the future. Beware vendors that say ‘future proof.’ That’s 100% pure marketing spin.